Skip to main content
Log in

Simultaneous confidence bands for nonparametric regression with missing covariate data

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We consider a weighted local linear estimator based on the inverse selection probability for nonparametric regression with missing covariates at random. The asymptotic distribution of the maximal deviation between the estimator and the true regression function is derived and an asymptotically accurate simultaneous confidence band is constructed. The estimator for the regression function is shown to be oracally efficient in the sense that it is uniformly indistinguishable from that when the selection probabilities are known. Finite sample performance is examined via simulation studies which support our asymptotic theory. The proposed method is demonstrated via an analysis of a data set from the Canada 2010/2011 Youth Student Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Al Ahmari, T., Alomar, A., Al Beeybe, J., Asiri, N., Al Ajaji, R., Al Masoud, R., Al-Hazzaa, M. (2017). Associations of self-esteem with body mass index and body image among Saudi college-age females. Eating and Weight Disorders-Studies on Anorexia, Bulimia and Obesity, 1, 1–9.

    Google Scholar 

  • Bickel, P., Rosenblatt, M. (1973). On some global measures of deviations of density function estimates. The Annals of Statistics, 31, 1852–1884.

    MathSciNet  MATH  Google Scholar 

  • Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley.

    MATH  Google Scholar 

  • Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. New York: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Cai, L., Li, L., Huang, S., Ma, L., Yang, L. (2020). Oracally efficient estimation for dense functional data with holiday effects. Test, 29(1), 282–306. https://doi.org/10.1007/s11749-019-00655-5.

  • Cai, L., Liu, R., Wang, S., Yang, L. (2019). Simultaneous confidence bands for mean and variance functions based on deterministic design. Statistica Sinica, 29, 505–525.

  • Cai, T., Low, M., Ma, Z. (2014). Adaptive confidence bands for nonparametric regression functions. Journal of the American Statistical Association, 109, 1054–1070.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, L., Yang, L. (2015). A smooth simultaneous confidence band for conditional variance function. Test, 24, 632–655.

    Article  MathSciNet  MATH  Google Scholar 

  • Cao, G., Wang, L., Li, Y., Yang, L. (2016). Oracle efficient confidence envelopes for covariance functions in dense functional data. Statistica Sinica, 26, 359–383.

    MathSciNet  MATH  Google Scholar 

  • Cao, G., Yang, L., Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24, 359–377.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, H., Little, R. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association, 94, 896–908.

    Article  MathSciNet  MATH  Google Scholar 

  • Chernozhukov, V., Chetverikov, D., Kato, K. (2014). Anti-concentrition and honest, adaptive confidence bands. The Annals of Statistics, 42, 1787–1818.

    MathSciNet  MATH  Google Scholar 

  • Claeskens, G., Van Keilegom, I. (2003). Bootstrap confidence bands for regression curves and their derivatives. The Annals of Statistics, 31, 1852–1884.

    Article  MathSciNet  MATH  Google Scholar 

  • Eubank, R., Speckman, P. (1993). Confidence bands in nonparametric regression. Journal of the American Statistical Association, 88, 1287–1301.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. London: Chapman and Hall.

    MATH  Google Scholar 

  • Fan, J., Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varyingcoefficient models. Scandinavian Journal of Statistics, 27, 715–731.

    Article  MathSciNet  MATH  Google Scholar 

  • Gu, L., Wang, L., Härdle, W., Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23, 806–843.

    Article  MathSciNet  MATH  Google Scholar 

  • Gu, L., Yang, L. (2015). Oracally efficient estimation for single-index link function with simultaneous confidence band. Electronic Journal of Statistics, 9, 1540–1561.

    Article  MathSciNet  MATH  Google Scholar 

  • Habib, F., Al Fozan, H., Barnawi, N., Al Motairi, W. (2015). Relationship between body mass index, self-esteem and quality of life among adolescent saudi female. Journal of Biology, Agriculture and Healthcare, 5, 2224–3208.

    Google Scholar 

  • Hall, P. (1991). On convergence rates of suprema. Probability Theory and Related Fields, 89, 447–455.

    Article  MathSciNet  MATH  Google Scholar 

  • Hall, P., Titterington, D. (1988). On confidence bands in nonparametric density estimation and regression. Journal of Multivariate Analysis, 27, 228–254.

    Article  MathSciNet  MATH  Google Scholar 

  • Härdle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis, 29, 163–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Härdle, W., Marron, J. (1991). Bootstrap simultaneous error bars for nonparametric regression. The Annals of Statistics, 19, 778–796.

    Article  MathSciNet  MATH  Google Scholar 

  • Horvitz, D. G., Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.

    Article  MathSciNet  MATH  Google Scholar 

  • Hosmer, D., Lemeshow, S. (2005). Applied Logistic Regression2nd ed. New York: Wiley.

    MATH  Google Scholar 

  • Hsu, C., Long, Q., Li, Y., Jacobs, E. (2014). A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. Journal of Biopharmaceutical Statistics, 24, 634–648.

    Article  MathSciNet  Google Scholar 

  • Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332–346.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnston, G. (1982). Probabilities of maximal deviations for nonparametric regression function estimates. Journal of Multivariate Analysis, 12, 402–414.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, J. K., Shao, J. (2013). Statistical Methods for Handling Incomplete Data. London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Liang, H., Wang, S., Robins, J., Carroll, R. (2004). Estimation in partially linear models with missing covariates. Journal of the American Statistical Association, 99, 357–367.

    Article  MathSciNet  MATH  Google Scholar 

  • Lipsitz, S. R., Ibrahim, J. G., Zhao, L.-P. (1999). A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. Journal of the American Statistical Association, 94, 1147–1160.

    Article  MathSciNet  MATH  Google Scholar 

  • Little, R., Rubin, D. (2019). Statistical Analysis with Missing Data3rd ed. New York: Wiley.

    MATH  Google Scholar 

  • Qin, J., Zhang, B., Leung, D. (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104, 1492–1503.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J., Rotnitzky, A., Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenblatt, M. (1952). Remarks on a multivariate transformation. Annals of the Institute of Statistical Mathematics, 23, 470–472.

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Song, Q., Yang, L. (2009). Spline confidence bands for variance function. Journal of Nonparametric Statistics, 21, 589–609.

    Article  MathSciNet  MATH  Google Scholar 

  • Tusnády, G. (1977). A remark on the approximation of the sample df in the multidimensional case. Periodica Mathematica Hungarica, 8, 53–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Q. (2009). Statistical estimation in partial linear models with covariate data missing at random. Annals of the Institute of Statistical Mathematics, 61, 47–84.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, J. (2012). Modelling time trend via spline confidence band. Annals of the Institute of Statistical Mathematics, 64, 275–301.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C., Wang, S., Carroll, R. (1998). Local linear regression for generalized linear models with missing data. Annals of Statistics, 26, 1028–1050.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C., Wang, S., Zhao, L.-P., Ou, S.-T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92, 512–525.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, J., Yang, L. (2009). Polynomial spline confidence bands for regression curves. Statistica Sinica, 19, 325–342.

    MathSciNet  MATH  Google Scholar 

  • Zhao, Z., Wu, W. (2008). Confidence bands in nonparametric time series regression. Annals of Statistics, 36, 1854–1878.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, S., Liu, R., Yang, L., Härdle, W. (2016). Statistical inference for generalized additive models: simultaneous confidence corridors and variable selection. Test, 25, 607–626.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, S., Yang, L., Hardle, W. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. Journal of the American Statistical Association, 109, 661–673.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, S., Shen, X., Wolfe, D. (1998). Local asymptotics of regression splines and confidence regions. Annals of Statistics, 26, 1760–1782.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the Associate Editor and two referees for their helpful comments and suggestions that substantially improved an earlier version of this manuscript. This research was supported in part by the National Natural Science Foundation of China Award NSFC #11901521, #11701403, First Class Discipline of Zhejiang–A (Zhejiang Gongshang University–Statistics), Zhejiang Province Statistical Research Program #20TJQN04, the 2017 Jiangsu Overseas Visiting Scholar Program for University Prominent Young & Middle-aged Teachers and Presidents, Jiangsu Province Key-Discipline Program (Statistics) GD10700118, the National Natural Science Foundation of China (General Program 11871460, Key Program 11331011 and Program for Creative Research Group in China 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS, China, and the Simons Foundation Mathematics and Physical Sciences Program Award #499650.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suojin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 207 KB)

Appendices

A. Appendix

We use \(a_{n}\sim b_{n}\) to represent \(\lim _{n\rightarrow \infty }a_{n}/b_{n}=c,\) where c is some nonzero constant. For any function \( \varphi \left( u\right) \) defined on \(\left[ a,b\right] \), let \(\left\| \varphi \left( u\right) \right\| _{\infty }=\left\| \varphi \right\| _{\infty }=\sup _{u\in \left[ a,b\right] }\left| \varphi \left( u\right) \right| \).

A.1 Preliminaries

This section gives some lemmas that are needed in our theoretical development. Their proofs are given in the Supplementary Material.

Lemma 1

(Theorem 1.2 of Bosq (1998)) Let \(\xi _{1},\dots ,\xi _{n}\) be independent random variables with mean 0. If there exists \(c>0\) such that (Cramér’s Conditions)

then for any \(t>0\),

Lemma 2

Under Assumptions (A1)–(A5), for any integer \(l\ge 0\), as \(n\rightarrow \infty \), one has

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| n^{-1}\sum \limits _{i=1}^{n} \frac{\delta _{i}}{\pi _{i}}K_{h}\left( X_{i}-x\right) \left( X_{i}-x\right) ^{l}\varepsilon _{i}\right| =O_{p}\left( n^{-1/2}h^{l-1/2}\log ^{1/2}n\right) . \end{aligned}$$

In the following, we discuss the representations of the weighted estimators \( {\hat{m}}\left( x,\pi \right) \) and \({\hat{m}}\left( x,\hat{\pi }\right) \), and break the errors \({\hat{m}}\left( x,\pi \right) -m\left( x\right) \) and \({\hat{m}}\left( x,\hat{\pi }\right) -m\left( x\right) \) into simpler parts to prove Theorems 1 and 3. Let

and

Then

$$\begin{aligned} {\mathbf {X}}^{T}\mathbf {WX}\!=\!\!\left( \begin{array}{cc} \!L_{n,0}\left( x\right) &{} L_{n,1}\left( x\right) \! \\ \!L_{n,1}\left( x\right) &{} L_{n,2}\left( x\right) \! \end{array} \right) , \\ {\mathbf {X}}^{T}{\mathbf {W}}\!\left( \!{\mathbf {Y}}\!\!-\!m\left( x\right) {\mathbf {X}}e_{0}\!-\!m^{\left( 1\right) }\left( x\right) {\mathbf {X}} e_{1}\!\right) \! \!=\!\!\left( \begin{array}{c} \!M_{n,0}\left( x\right) \!\!\\ \!M_{n,1}\left( x\right) \!\! \end{array} \right) , \end{aligned}$$

where \(e_{1}=\left( 0,1\right) ^{T}\). By (4), one then has

$$\begin{aligned} {\hat{m}}\left( x,\pi \right) -m\left( x\right)= & {} e_{0}^{T}\left( {\mathbf {X}} ^{T}\mathbf {WX}\right) ^{-1}{\mathbf {X}}^{T}{\mathbf {W}}\left( \mathbf {Y-} m\left( x\right) {\mathbf {X}}e_{0}-m^{\left( 1\right) }\left( x\right) \mathbf { X}e_{1}\right) \nonumber \\= & {} e_{0}^{T}\left( \begin{array}{cc} \!L_{n,0}\left( x\right) &{} L_{n,1}\left( x\right) \! \\ \!L_{n,1}\left( x\right) &{} L_{n,2}\left( x\right) \! \end{array} \right) ^{-1}\!\left( \begin{array}{c} \!\!M_{n,0}\left( x\right) \!\!\! \\ \!\!M_{n,1}\left( x\right) \!\!\! \end{array} \right) . \end{aligned}$$
(11)

To further study \({\hat{m}}\left( x,\hat{\pi }\right) \), let

and

By (5), one then obtains that

$$\begin{aligned} {\hat{m}}\left( x,\hat{\pi }\right) \!-\!m\left( x\right) \!= & {} \!e_{0}^{T}\left( {\mathbf {X}}^{T}{\hat{\mathbf {W}}{} \mathbf{X}}\right) ^{-1}{\mathbf {X}} ^{T}{\hat{\mathbf {W}}}\left( \mathbf {Y-}m\left( x\right) {\mathbf {X}} e_{0}-m^{\left( 1\right) }\left( x\right) {\mathbf {X}}e_{1}\right) \nonumber \\= & {} e_{0}^{T}\left( \begin{array}{cc} \!{\hat{L}}_{n,0}\left( x\right) &{} {\hat{L}}_{n,1}\left( x\right) \! \\ \!{\hat{L}}_{n,1}\left( x\right) &{} {\hat{L}}_{n,2}\left( x\right) \! \end{array} \right) ^{-1}\!\left( \begin{array}{c} {\hat{M}}_{n,0}\left( x\right) \\ {\hat{M}}_{n,1}\left( x\right) \end{array} \right) . \end{aligned}$$
(12)

Lemma 3

Under Assumptions (A1) and (A3)–(A5), as \(n\rightarrow \infty \), uniformly for all \( x \in \left[ a_{0},b_{0}\right] \), one has

$$\begin{aligned} L_{n,l}\left( x\right) =h^{l}f_{X}\left( x\right) \mu _{l}\left( K\right) +u_{p}\left( h^{l+1}\right) +U_{p}\left( n^{-1/2}h^{l-1/2}\log ^{1/2}n\right) ,l=0,1,2. \end{aligned}$$

Lemma 4

Under Assumptions (A1)–(A5), as \(n\rightarrow \infty \), uniformly for all \( x \in \left[ a_{0},b_{0}\right] \), one has

and

$$\begin{aligned} M_{n,1}\left( x\right) =U_{p}\left( n^{-1/2}h^{1/2}\log ^{1/2}n\right) . \end{aligned}$$

Lemma 5

Under Assumptions (A1)–(A5), as \(n\rightarrow \infty \), one has

$$\begin{aligned} \sup _{u\in \left[ a_{0},b_{0}\right] }\left| {\hat{L}}_{n,l}\left( x\right) -L_{n,l}\left( x\right) \right| =O_{p}\left( n^{-1/2}\right) , l=0,1,2, \end{aligned}$$

and

$$\begin{aligned} \sup _{u\in \left[ a_{0},b_{0}\right] }\left| {\hat{M}}_{n,l}\left( x\right) -M_{n,l}\left( x\right) \right| =O_{p}\left( n^{-1/2}\right) , l=0,1. \end{aligned}$$

A.2 Conditional limiting extreme value distribution of \(V_{n}(x)\)

This section contains the main steps to obtain the conditional extreme value distribution of \(V_{n}(x)=n^{-1}f_{X}^{-1}\left( x\right) \sum \limits _{i=1}^{n}\frac{\delta _{i}}{\pi _{i}}K_{h}\left( X_{i}\!-\!x\right) \varepsilon _{i}\) shown in Theorem 6 at the end of this section which will be used in the total probability formula in the proof of Theorem 2.

The Rosenblatt quantile transformation in Rosenblatt (1952) is adopted with

$$\begin{aligned} T\left( X,\varepsilon \right) =\left( X^{*},\varepsilon ^{*}\right) =\left( F_{X|\delta =1}\left( X\right) ,F_{\varepsilon |X,\delta =1}\left( \varepsilon |X\right) \right) , \end{aligned}$$

where \(F_{X|\delta =1}\left( X\right) \) is the conditional distribution function of X given \(\delta =1\) and \(F_{\varepsilon |X,\delta =1}\left( \varepsilon |X\right) \) is the conditional distribution function of \( \varepsilon \) given X and \(\delta =1\). This transformation produces mutually independent uniform random variables \(\left( X^{*},\varepsilon ^{*}\right) \) on \(\left[ 0,1\right] ^{2}\). According to the strong approximation theorem in Tusnady (1977) (Theorem 1), there exists a sequence of two dimensional Brownian bridges \(B_{n}\) such that

$$\begin{aligned} \sup \nolimits _{x,\varepsilon }\left| Z_{n}\left( x,\varepsilon \right) -B_{n}\left( T\left( x,\varepsilon \right) \right) \right| =O_{a.s.}\left( n^{-1/2}\log ^{2}n\right) , \end{aligned}$$
(13)

where \(Z_{n}\left( x,\varepsilon \right) =n^{1/2}\left\{ F_{n}\left( x,\varepsilon \right) -F_{X,\varepsilon |\delta =1}\left( x,\varepsilon \right) \right\} \) with \(F_{n}\left( x,\varepsilon \right) \) and \( F_{X,\varepsilon |\delta =1}\) \(\left( x,\varepsilon \right) \) representing the empirical and the theoretical distribution of \(\left( X,\varepsilon \right) \) given \(\delta =1\). The transformation and the strong approximation results have been also used in Johnston (1982), Härdle (1989), and Wang and Yang (2009) for constructing SCBs for the nonparametric regression when data are fully observed.

To obtain the distribution of \(\sup _{x\in \left[ a_{0},b_{0} \right] }\left| V_n(x)\right| \) conditional on \(\Delta _{n}=n_{0}\), we will show the following Lemmas 68. Here \(\{n_{0}\}\) is a sequence of numbers related to n with \(1\le n_{0}\le n\). By (6) it is clear that there exists a constant \(r>0\) such that \(r \le \Delta _n/n\le 1\) in probability as \(n \rightarrow \infty \). Thus we only need to consider \(n_0 \ge r\times n\). That is, \(n_0\) and n have the same order as \(n \rightarrow \infty \). Therefore, to unify the notation in the following we will use n in the convergence rate.

Meanwhile, due to the i.i.d. assumption of the data, conditional on \(\Delta _{n}=\sum _{i=1}^{n}\delta _{i}=n_{0}\) is equivalent to conditional on the event that there are \(n_0\) elements in \(\varvec{\delta }_{n}=\left( \delta _{1},...,\delta _{n}\right) ^{T}\) that are equal to 1 and the rest \((n-n_0)\) elements are equal to 0. Without loss of generality, let \(\delta _{i}=1\) for \(i=1,\dots ,n_{0}\) and \(\delta _{i}=0\) for \(i=n_{0}+1,\dots ,n\).

Notice that, for \(i=1,\dots ,n\),

Thus, conditional on \(\Delta _{n}=n_{0}\), \(1\le n_{0}\le n\), by symmetry one has

and

(14)

Moreover, as discussed above, conditional on \(\Delta _{n}=n_{0}\) one can let \(\delta _{i}=1\) for \(i=1,\dots ,n_{0}\) and \(\delta _{i}=0\) for \(i=n_{0}+1,\dots ,n\) without loss of generality. Then conditional on \(\Delta _{n}=n_{0}\) one can write

$$\begin{aligned} V_{n}\left( x\right) =n^{-1}f_{X}^{-1}\left( x\right) \sum \limits _{i=1}^{n}\frac{\delta _i}{\pi _{i}}K_{h}\left( X_{i}\!-\!x\right) \varepsilon _{i} \\ =n^{-1}f_{X}^{-1}\left( x\right) \sum \limits _{i=1}^{n_{0}}\frac{1}{\pi _{i}}K_{h}\left( X_{i}\!-\!x\right) \varepsilon _{i}. \end{aligned}$$

Conditional on \(\Delta _{n}=n_{0}\) we now introduce the following standardized stochastic process:

(15)

which can be rewritten as

$$\begin{aligned} \zeta _{1n_{0}}\left( x\right) =h^{1/2}s^{-1/2}\left( x\right) \int \int \frac{1}{\pi \left( m\left( u\right) +\varepsilon \right) }K_{h}\left( u-\!x\right) \varepsilon dZ_{n_{0}}\left( u,\varepsilon \right) , \end{aligned}$$

where \(Z_{n_{0}}\left( u,\varepsilon \right) \) is the same as \(Z_{n}\left( u,\varepsilon \right) \) in (13) but with n replaced by \(n_{0}\).

Let \(\kappa _{n}=n^{\theta }\) with \(\frac{2}{3\eta }<\theta <\frac{1}{6}\) where \(\eta >4\) is given in Assumption (A2), which together with Assumption (A5) implies that

$$\begin{aligned} \kappa _{n}^{-\eta }h^{-2} \log n =O\left( 1\right) ,\quad \kappa _{n}^{2}n^{-1/2}h^{-1/2}\left( \log n\right) ^{5/2}=o\left( 1\right) . \end{aligned}$$
(16)

Then conditional on \(\Delta _{n}=n_{0}\) one can define the following processes to approximate \(\zeta _{1n_{0}}\left( x\right) \):

$$\begin{aligned} \zeta _{2n_{0}}\left( x\right)= & {} h^{1/2}s_{n}^{-1/2}\left( x\right) \int \int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{1}{\pi \left( m\left( u\right) +\varepsilon \right) }K_{h}\left( u-\!x\right) \varepsilon dZ_{n_{0}}\left( u,\varepsilon \right) , \\ \zeta _{3n_{0}}\left( x\right)= & {} h^{1/2}s_{n}^{-1/2}\left( x\right) \int \int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{1}{\pi \left( m\left( u\right) +\varepsilon \right) }K_{h}\left( u-\!x\right) \varepsilon dB_{n_{0}}\left( T\left( u,\varepsilon \right) \right) , \\ \zeta _{4n_{0}}\left( x\right)= & {} h^{1/2}s_{n}^{-1/2}\left( x\right) \int \int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{1}{\pi \left( m\left( u\right) +\varepsilon \right) }K_{h}\left( u-\!x\right) \varepsilon dW_{n_{0}}\left( T\left( u,\varepsilon \right) \right) , \end{aligned}$$

where \(s_{n}\left( x\right) =\int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{\varepsilon ^{2}}{\pi ^{2}\left( m\left( x\right) +\varepsilon \right) }f_{X,\varepsilon |\delta =1}\left( x,\varepsilon \right) d\varepsilon ,\) \(B_{n_{0}}\left( T\left( u,\varepsilon \right) \right) \) is the sequence of Brownian bridges in (13) and \(W_{n_{0}} \left( T\left( u,\varepsilon \right) \right) \) is the sequence of Wiener processes satisfying \( B_{n_{0}}\!\left( u,s\right) \!=\!W_{n_{0}}\!\left( u,s\right) \) \( -usW_{n_{0}}\left( 1,1\right) \). Moreover, define

$$\begin{aligned} \zeta _{5n_{0}}\left( x\right) =h^{1/2}s_{n}^{-1/2}\left( x\right) \int s_{n}^{1/2}\left( u\right) K_{h}\left( u-\!x\right) dW\left( u\right) , \end{aligned}$$

and

$$\begin{aligned} \zeta _{6n_{0}}\left( x\right) =h^{1/2}\int K_{h}\left( u-\!x\right) dW\left( u\right) , \end{aligned}$$

where \(W\left( u\right) \) is a two-sided Wiener process on \(\left( -\infty ,+\infty \right) \). Conditional on \(\Delta _{n}=n_{0}\), according to Theorem 3.1 in Bickel and Rosenblatt (1952), one has

$$\begin{aligned} P\!\left[ \!\left. a_{h}\left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{6n_{0}}\left( x\right) \right| /\lambda ^{1/2}\left( K\right) -b_{h}\right\} \!\le t\right| \Delta _{n}=n_{0} \right] \! \rightarrow \exp \left\{ -2\exp \left( -t\right) \right\} \nonumber \\ \end{aligned}$$
(17)

\(\forall t\in \mathbb {R}\), as \(n_0 \) (and thus n) \(\rightarrow \infty \). Here \(a_{h},b_{h}\), and \(\lambda \left( K\right) \) are given in Theorem 2.

The proofs of the following Lemmas 6 and 7 are given in the Supplementary Material due to the space limitation.

Lemma 6

Under Assumptions (A1)–(A5), conditional on \( \Delta _{n}=n_{0}\), for an increasing sequence \(\{n_0\}\), as \(n_0 \rightarrow \infty \), one has

$$\begin{aligned}&(a)\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{2n_{0}}\left( x\right) \right. \\&\left. \quad -\zeta _{3n_{0}}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) , \\&(b)\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{3n_{0}}\left( x\right) \right. \\&\left. \quad -\zeta _{4n_{0}}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) , \\&(c)\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{5n_{0}}\left( x\right) \right. \\&\left. \quad -\zeta _{6n_{0}}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) . \end{aligned}$$

Lemma 7

Conditional on \(\Delta _{n}=n_{0}\) for an increasing sequence \(\{n_0\}\), the stochastic processes \(\zeta _{4n_{0}}\left( x\right) \) and \(\zeta _{5n_{0}}\left( x\right) \) have the same asymptotic distribution as \(n_0 \rightarrow \infty \).

Lemmas 6 and 7, expression (17), and Slutsky’s Theorem imply that

$$\begin{aligned} P\left[ \left. a_{h}\left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{2n_{0}}\left( x\right) \right| /\lambda ^{1/2}\left( K\right) \!-\!b_{h}\right\} \le t\right| \Delta _{n}=n_{0} \right] \! \rightarrow \exp \! \left\{ -2\exp \left( -t\right) \right\} \nonumber \\ \end{aligned}$$
(18)

\(\forall t\in \mathbb {R}\), as \(n_0 \rightarrow \infty \).

Lemma 8

Under Assumptions (A1)–(A5), conditional on \( \Delta _{n}=n_{0}\) for an increasing sequence \(\{n_0\}\), one has

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{1n_{0}}\left( x\right) -\zeta _{2n_{0}}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) , \end{aligned}$$

as \(n_0 \rightarrow \infty \).

Proof of Lemma 8. Define

$$\begin{aligned} \zeta _{1n_{0}}^{*}\left( x\right) =h^{1/2}s^{-1/2}\left( x\right) \int \int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{1}{\pi \left( m\left( u\right) +\varepsilon \right) }K_{h}\left( u-\!x\right) \varepsilon dZ_{n_{0}}\left( u,\varepsilon \right) . \end{aligned}$$

To prove the lemma, it is sufficient to prove that conditional on \( \Delta _{n}=n_{0}\)

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{1n_{0}}\left( x\right) -\zeta _{1n_{0}}^{*}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) \end{aligned}$$
(19)

and

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \zeta _{2n_{0}}\left( x\right) -\zeta _{1n_{0}}^{*}\left( x\right) \right| =o_{p}\left( \log ^{-1/2}n\right) \end{aligned}$$
(20)

as \(n_0 \rightarrow \infty \). In the following, we first show (20). By (18) and the fact that \(b_{h}=O\left( \log ^{1/2}n\right) \), one has \(\sup _{x\in \left[ a_{0},b_{0}\right] }\) \( \left| \zeta _{2n_{0}}\left( x\right) \right| \) \(=O_{p}\left( \log ^{1/2}n\right) \) which with (S.6) in the Supplementary Material implies that

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] } \left| \zeta _{2n_{0}}\left( x\right) \! - \! \zeta _{1n_{0}}^{*}\left( x\right) \right|= & {} \sup _{x\in \left[ a_{0},b_{0}\right] }\Bigg | h^{1/2}\left\{ s^{-1/2}\left( x\right) -s_{n}^{-1/2}\left( x\right) \right\} \\&\times \!\left. \int \!\int _{\left| \varepsilon \right| \le \kappa _{n}} \! \frac{1}{\pi \left( m\left( u\right) \!+\! \varepsilon \right) }K_{h}\left( u\! -\!x\right) \varepsilon dZ_{n_{0}}\left( u,\varepsilon \right) \right| \\= & {} O_{p}\left( h^{2}\log ^{-1/2}n\right) =o_{p}\left( \log ^{-1/2}n\right) . \end{aligned}$$

We next prove (19). Notice that

For convenience, we denote

To prove (19), it is sufficient to verify that

By Theorem 15.6 in Billingsley (1968), it suffices to show: (i) conditional on \(\Delta _{n}=n_{0}\), \(\rightarrow 0\) in probability for any given \(x\in \left[ a_{0},b_{0}\right] \) and (ii) the tightness of conditional on \(\Delta _{n}=n_{0}\), using the following moment condition:

for any \(x\in \left[ x_{1},x_{2}\right] \) and some constant \( C>0\) that is independent of \(n_0\).

Firstly, note that \(\varsigma _{i,n}\left( x\right) ,1\le i\le n\), are independent variables with and

Thus, by (16), one has ,

Secondly, notice that

Since \(K\left( u\right) \) \(\in C^{\left( 1\right) }\left[ -1,1\right] \) by Assumption (A3),

and

for some constant \(C_{1}>0\) that is independent of \(n_0\). Therefore, by the Schwarz inequality, one has that

which together with (16) concludes that

for some \(C>0\) that is independent of \(n_0\), verifying the tightness. The proof is completed. \(\Box \)

By the definitions of \( V_{n}\left( x\right) \) in Theorem 1 and \(\zeta _{1n_{0}}(x)\) in (15), one has \(\zeta _{1n_{0}}\left( x\right) =\left( nh\right) ^{1/2}r_{n}^{-1/2}s^{-1/2}\left( x\right) f_{X}\left( x\right) V_{n}\left( x\right) \) given \(\Delta _{n}=n_{0}\). This together with Lemma 8, expression (18), and Slutsky’s Theorem concludes the following result.

Theorem 6

Under Assumptions (A1)–(A5), one has that, for any \(t\in \mathbb {R}\), as \( n_{0}\rightarrow \infty ,\)

$$\begin{aligned} P\left[ a_{h}\left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\left( x\right) /d^{1/2}\left( x\right) \right| -b_{h}\right\} \le t\bigg \vert \Delta _{n}=n_{0}\right] \nonumber \\ \rightarrow \exp \left\{ -2\exp \left( -t\right) \right\} . \end{aligned}$$
(21)

A.3 Proofs of the theorems in Section 2

Proof of Theorem 1. By Lemma 3and Assumption (A5), one has

$$\begin{aligned} {\mathbf {X}}^{T}\mathbf {WX=}\left( \begin{array}{cc} L_{n,0}\left( x\right) &{} L_{n,1}\left( x\right) \\ L_{n,1}\left( x\right) &{} L_{n,2}\left( x\right) \end{array} \right) \!=\!f_{X}\left( x\right) \left( \begin{array}{cc} 1+u_p(h) &{}U_p(h^{2})\\ U_p(h^{2}) &{} h^{2} \mu _{2}\left( K\right) +u_p(h^3) \end{array} \right) \end{aligned}$$

which implies that

$$\begin{aligned} \left( {\mathbf {X}}^{T}\mathbf {WX}\right) ^{-1}=f_{X}^{-1}\left( x\right) \left( \begin{array}{cc} 1+u_{p}\left( h\right) &{} U_{p}\left( 1\right) \\ U_{p}\left( 1\right) &{} h^{-2}\mu _{2}^{-1}\left( K\right) +u_{p}\left( h^{-1}\right) \end{array} \right) . \end{aligned}$$

It together with (11) and Lemmas 2 and 4 concludes that uniformly for all \(x\in \left[ a_{0},b_{0}\right] \),

The proof is completed. \(\Box \)

Proof of Theorem 2. According to Theorem 6, for any \(t\in \mathbb {R}\) , as \(n_{0}\rightarrow \infty \),

$$\begin{aligned} P\left[ a_{h}\left. \left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\left( x\right) /d^{1/2}\left( x\right) \right| -b_{h}\right\} \le t\right| \Delta _{n}=n_{0}\right] \\ \rightarrow \exp \left\{ -2\exp \left( -t\right) \right\} . \end{aligned}$$

Thus one has that for any given \(\epsilon >0\) and \(t\in \mathbb {R}\), there exists \(N_{0}>0\) such that

$$\begin{aligned} \left| P\left[ a_{h}\left. \left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\left( x\right) /d^{1/2}\left( x\right) \right| -b_{h}\right\} \le t\right| \Delta _{n}=n_{0}\right] \right. \\ \left. -\exp \left\{ -2\exp \left( -t\right) \right\} \right. \Bigg \vert < \frac{\epsilon }{2} \end{aligned}$$

for all \(n_{0}\ge N_{0}\). On the other hand, since \(\Delta _{n}/n\rightarrow P\left( \delta _{1}=1\right) >0\) a.s., there exists \( N>N_{0}\) such that when \(n\ge N\), \(P\left( \Delta _{n}\ge N_{0}\right) >1-\epsilon /2\). Therefore, unconditional on \(\Delta _{n}\), for \(n\ge N\),

$$\begin{aligned} \left| P\!\left[ \!a_{h}\!\left\{ \! \sup _{x\in \left[ a_{0},b_{0}\right] }\!\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\!\left( x\right) \! /d^{1/2}\left( x\right) \right| \! -b_{h}\!\right\} \!\le t\right] \!-\exp \left\{ -2\exp \left( -t\right) \right\} \right| \\ \le \sum _{n_{0}=1}^{n}\!\left| P\!\left[ a_{h}\!\left. \left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\!\left( x\right) /d^{1/2}\left( x\right) \right| -b_{h}\right\} \le t\right| \Delta _{n}=n_{0}\right] \right. \\ \left. -\exp \left\{ -2\exp \left( -t\right) \right\} \right. \Bigg \vert \times P(\Delta _{n}=n_{0})+P(\Delta _{n}=0) \\ \le \!\sum _{n_{0}=N_{0}}^{n}\!\left| P\!\left[ a_{h}\!\left. \! \left\{ \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \left( nh\right) ^{1/2}r_{n}^{-1/2}V_{n}\!\left( x\right) /d^{1/2}\left( x\right) \right| \! -b_{h}\!\right\} \le t\right| \Delta _{n}=n_{0}\right] \right. \\ \left. -\exp \left\{ -2\exp \left( -t\right) \right\} \right. \Bigg \vert \times P(\Delta _{n}=n_{0})+\frac{\epsilon }{2}<\epsilon . \end{aligned}$$

This together with the fact that the dominating term of \( {\hat{m}}\left( x,\pi \right) -m\left( x\right) \) is \( V_{n}(x) \) as seen in Theorem 1 concludes Theorem 2. \(\Box \)

Proof of Theorem 3. By (11) and (12) one has

$$\begin{aligned}&{\hat{m}}\left( x,\pi \right) -{\hat{m}}\left( x,\hat{\pi }\right) =e_{0}^{T}\left( \begin{array}{cc} \!L_{n,0}\left( x\right) &{} L_{n,1}\left( x\right) \! \\ \!L_{n,1}\left( x\right) &{} L_{n,2}\left( x\right) \! \end{array} \right) ^{-1}\left( \begin{array}{c} \!\!M_{n,0}\left( x\right) \!\!\! \\ \!\!M_{n,1}\left( x\right) \!\!\! \end{array} \right) \\&-e_{0}^{T}\left( \begin{array}{cc} \!{\hat{L}}_{n,0}\left( x\right) &{} {\hat{L}}_{n,1}\left( x\right) \! \\ \!{\hat{L}}_{n,1}\left( x\right) &{} {\hat{L}}_{n,2}\left( x\right) \! \end{array} \right) ^{-1}\left( \begin{array}{c} \!\!{\hat{M}}_{n,0}\left( x\right) \!\!\! \\ \!\!{\hat{M}}_{n,1}\left( x\right) \!\!\! \end{array} \right) . \end{aligned}$$

By Lemma 5, it is easily seen that

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| {\hat{m}}\left( x,\pi \right) -{\hat{m}}\left( x,\hat{\pi }\right) \right| =O_{p}\left( n^{-1/2}\right) , \end{aligned}$$

completing the proof. \(\Box \)

Proof of Theorem 5. By definition,

$$\begin{aligned} {\hat{d}}_{n}\left( x\right)= & {} \frac{n}{\Delta _{n}}{\hat{f}}_{X}^{-2}\left( x\right) \frac{h}{n} \mathop {\displaystyle \sum }\limits _{i=1}^{n}\frac{\delta _{i}}{\hat{\pi }_{i}^{2} }K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon }_{i}^{2}. \end{aligned}$$
(22)

Firstly, we study the uniform convergence property of \(\frac{h}{n}\sum \limits _{i=1}^{n}\frac{\delta _{i}}{\hat{\pi }_{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon }_{i}^{2}\). Notice that

$$\begin{aligned}&\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n}\frac{\delta _{i}}{\hat{\pi }_{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon }_{i}^{2}-\frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n} \frac{\delta _{i}}{\pi _{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{ \varepsilon }_{i}^{2}\right| \\&=\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n}\frac{\delta _{i}\left( \pi _{i}^{2}-\hat{\pi } _{i}^{2}\right) }{\hat{\pi }_{i}^{2}\pi _{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \left\{ m\left( X_{i}\right) -{\hat{m}}\left( X_{i},\hat{\pi }_{i}\right) +\varepsilon _{i}\right\} ^{2}\right| \\&=o_{p}\left( n^{-1/2}h^{-1}\right) \end{aligned}$$

and

which imply that

$$\begin{aligned}&\sup _{x\in \left[ a_{0},b_{0}\right] }\left| \frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n}\frac{\delta _{i}}{\hat{\pi }_{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon }_{i}^{2}-\frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n} \frac{\delta _{i}}{\pi _{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \varepsilon _{i}^{2}\right| \nonumber \\&=O_{p}\left( n^{-1/2}h^{-3/2}\log ^{1/2}n\right) . \end{aligned}$$
(23)

Secondly, denote . By applying the inequality in Lemma 1, the Borel-Cantelli Lemma, and the truncation and discretization method as in the proof of Lemma 2, one obtains that

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| \frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n}\!K_{h}^{2}\left( X_{i}\!-x\right) \varepsilon _{i}^{*}\right| =O_{p}\left( n^{-1/2}h^{-1/2}\log ^{1/2}n\right) \end{aligned}$$
(24)

as \(n\rightarrow \infty \). Meanwhile, similar to the proof of Lemma 3, one can easily show that

(25)

Combining (23), (24), and (25), one has

Meanwhile, by Lemmas 3 and 5, and \(h_f=O(n^{-1/5})\), one can easily obtain that

$$\begin{aligned} \sup _{x\in \left[ a_{0},b_{0}\right] }\left| {\hat{f}}_{X}\left( x\right) -\! f_{X}\left( x\right) \right| =o_{p}\left( h_f\right) +O_{p}\left( n^{-1/2}h_f^{-1/2}\log ^{1/2}n \right) =o_{p}\left( n^{-1/5} \right) \!. \end{aligned}$$

Thus,

which together with the fact that

implies

$$\begin{aligned}&\sup _{x\in \left[ a_{0},b_{0}\right] }\left| {\hat{f}}_{X}^{-2}\left( x\right) \frac{h}{n}\mathop {\displaystyle \sum }\limits _{i=1}^{n}\!\frac{\delta _{i}}{\hat{\pi } _{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon } _{i}^{2}-d\left( x\right) P\left( \delta _{1}=1\right) \right| \nonumber \\&=O_{p}\left( n^{-1/2}h^{-3/2}\log ^{1/2}n\right) . \end{aligned}$$
(26)

It is easily seen from (22), (6), and (26) that

$$\begin{aligned}&\sup _{x\in \left[ a_{0},b_{0}\right] }\left| {\hat{d}}_{n}\left( x\right) -d\left( x\right) \right| =O_{p}\left( n^{-1/2}h^{-3/2}\log ^{1/2}n\right) , \end{aligned}$$

completing the proof. \(\Box \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, L., Gu, L., Wang, Q. et al. Simultaneous confidence bands for nonparametric regression with missing covariate data. Ann Inst Stat Math 73, 1249–1279 (2021). https://doi.org/10.1007/s10463-021-00784-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-021-00784-5

Keywords

Navigation