Skip to main content

Bootstrapping Nonparametric M-Smoothers with Independent Error Terms

  • Conference paper
  • First Online:
Nonparametric Statistics (ISNPS 2016)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 250))

Included in the following conference series:

  • 1066 Accesses

Abstract

On the one hand, nonparametric regression approaches are flexible modeling tools in modern statistics. On the other hand, the lack of any parameters makes these approaches more challenging when assessing some statistical inference in these models. This is crucial especially in situations when one needs to perform some statistical tests or to construct some confidence sets. In such cases, it is common to use a bootstrap approximation instead. It is an effective alternative to more straightforward but rather slow plug-in techniques. In this contribution, we introduce a proper bootstrap algorithm for a robustified version of the nonparametric estimates, the so-called M-smoothers or M-estimates, respectively. We distinguish situations for homoscedastic and heteroscedastic independent error terms, and we prove the consistency of the bootstrap approximation under both scenarios. Technical proofs are provided and the finite sample properties are investigated via a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antoch, J., & Janssen P. (1989). Nonparametric regression m-quantiles. Statistics & Probability Letters, 8, 355–362.

    Google Scholar 

  2. Boente, G., Ruiz, M., & Zamar, R. (2010). On a robust local estimator for the scale function in heteroscedastic nonparametric regression. Statistics & Probability Letters, 80, 1185–1195.

    Google Scholar 

  3. Fan, J., & Gijbels, I. (1995). Local polynomial modelling and its applications (1st ed.). Boca Raton, FL: Chapman & Hall

    Google Scholar 

  4. Hall, P., Kay, J., & Titterington, D. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrica, 77, 521–528.

    Google Scholar 

  5. Härdle, W., & Gasser, T. (1984). Robust nonparametric function fitting. Journal of the Royal Statistical Society, Series B, 46, 42–51.

    Google Scholar 

  6. Härdle, W. K., & Marron, J. S. (1991). MBootstrap simultaneous error bars for nonparametric regression. The Annals of Statistics, 19(2), 778–796.

    Google Scholar 

  7. Hušková, M., & Maciak, M. (2017). Discontinuities in robust nonparametric regression with α-mixing dependence. Journal Nonparametric Statistics, 29(2), 447–475.

    Google Scholar 

  8. Hwang, R. C. (2002). A new version of the local constant M-smoother. Communications in Statistics: Theory and Methods, 31, 833–848

    Google Scholar 

  9. Leung, D. (2005). Cross-validation in nonparametric regression with outliers. Annals of Statistics, 33, 2291–2310.

    Google Scholar 

  10. Leung, D., Marriott, F., & Wu, E. (1993). Bandwidth selection in robust smoothing. Journal of Nonparametric Statistics, 2, 333–339.

    Google Scholar 

  11. Maciak, M. (2011). Flexibility, Robustness, and Discontinuity in Nonparametric Regression Approaches. Ph.D. thesis, Charles University, Prague.

    Google Scholar 

  12. Müller, H., & Stadtmüller, U. (1987). Estimation of heteroscedasticity in regression analysis. Annals of Statistics, 15, 610–625.

    Google Scholar 

  13. Nadaraya, E. (1964). On estimating regression. Theory Probability Applications, 9, 141–142.

    Google Scholar 

  14. Neumeyer, N. (2006). Bootstrap Procedures for Empirical Processes of Nonparametric Residuals. Habilitationsschrift, Ruhr-University Bochum, Germany.

    Google Scholar 

  15. Rue, H., Chu, C., Godtliebsen, F., & Marron, J. (1998). M-smoother with local linear fit. Journal of Nonparametric Statistics, 14, 155–168.

    Google Scholar 

  16. Serfling, R. (1980). Approximation theorems of mathematical statistics. New York: Wiley.

    Google Scholar 

  17. Watson, G. (1964). Smooth regression analysis. The Indian Journal of Statistics, Series A, 26, 359–372.

    Google Scholar 

Download references

Acknowledgements

The author’s research was partly supported by the Grant P402/12/G097.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matúš Maciak .

Editor information

Editors and Affiliations

Appendix

Appendix

In this section we provide some technical details and the proof of the bootstrap consistency result stated in Theorem 2. Let \(\{(X_{i}, Y_{i}^{\star });~i = 1, \dots , n\}\) be the bootstrapped data where \(Y_{i}^{\star } = \widehat {m}(X_{i}) + \widehat {\sigma }(X_{i})\varepsilon _{i}^{\star }\), for \(\widehat {m}(X_{i})\) being the M-smoother estimate of m(X i), \(\widehat {\sigma }(X_{i})\) the estimate of σ(X i) in sense of (5), and the random error terms \(\{\varepsilon _{i}^{\star }\}_{i =1 }^{n}\) are defined in B3 step of the bootstrap algorithm in Sect. 3. Then, we can obtain the bootstrapped version of \(\widehat {m}(x)\), for some x ∈ (0, 1), given as a solution of the minimization problem

$$\displaystyle \begin{aligned} \widehat{\boldsymbol{\beta}}_{x}^{\star} = \operatorname*{Argmin}_{(b_{0}, \dots, b_{p})^{\top} \in \mathbb{R}^{p + 1}} \sum_{i = 1}^{n} \rho \bigg( Y_{i}^{\star} - \sum_{j = 0}^{p} b_{j} (X_{i} - x)^{j} \bigg) \cdot K \left(\frac{X_{i} - x}{h_{n}}\right), {} \end{aligned} $$
(6)

where \(\widehat {\boldsymbol {\beta }}_{x}^{\star } = (\widehat {\beta }_{0}^{\star }, \dots , \widehat {\beta }_{p}^{\star })^{\top }\), and \(\widehat {m}^{\star }(x) = \widehat {\beta }_{0}^{\star }\). Using the smoothness property of m(⋅) we can apply the Taylor expansion of the order p and given the model definition in (1) we can rewrite the minimization problem as an equivalent problem given by the following set of equations:

$$\displaystyle \begin{aligned} \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star} - \sum_{j = 0}^{p} b_{j} \Big(\frac{X_{i} - x}{h_{n}}\Big)^{j} \bigg) \cdot \left(\frac{X_{i} - x}{h_{n}}\right)^{\ell} K \left(\frac{X_{i} - x}{h_{n}}\right) = 0, {} \end{aligned}$$

for  = 0, …, p, where ψ = ρ′. Next, for any  ∈{0, …, p} and \(\boldsymbol {b} \in \mathbb {R}^{p + 1}\) let us define an empirical process M n(b, ) and its bootstrap counterpart \(M_{n}^{\star }(\boldsymbol {b}, \ell )\) as follows:

$$\displaystyle \begin{aligned} M_{n}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \Bigg[\psi \bigg( \sigma(X_{i})\varepsilon_{i} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) - \psi \bigg( \sigma(X_{i})\varepsilon_{i}\bigg)\Bigg]K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$
(7)

and

$$\displaystyle \begin{aligned} M_{n}^{\star}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \Bigg[\psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) - \psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star}\bigg)\Bigg]K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$
(8)

where for brevity we used the notation \(\xi _{i}^{\ell }(x) = \Big (\frac {X_{i} - x}{h_{n}}\Big )^{\ell }\). We need to investigate the behavior of \(M_{n}^{\star }(\boldsymbol {b}, \ell )\), conditionally on the sample {(X i, Y i); i = 1, …, n)}, and we will compare it with the behavior of M n(b, ).

Let G (⋅) be the distribution function of the bootstrap residuals \(\{\varepsilon _{i}^{\star }\}_{i = 1}^{n}\) defined in B3. It follows from the definition that

$$\displaystyle \begin{aligned} G^{\star}(e) & = P^{\star}[\varepsilon_{i}^{\star} \leq e] = P^{\star}[V_{i}\cdot\tilde{\varepsilon}_{i} + a_{n}\cdot Z_{i} \leq e] \\ & = \frac{1}{2n}\Bigg[\int_{\mathbb{R}} \sum_{i = 1}^{n}\mathbb{I}_{\{\widehat{\varepsilon}_{i} \leq e - a_n u\}} \phi(u)\text{d}u + \int_{\mathbb{R}} \sum_{i = 1}^{n}\mathbb{I}_{\{\widehat{\varepsilon}_{i} \geq a_n u - e\}} \phi(u)\text{d}u\Bigg]\\ & = \frac{1}{2n} \sum_{i = 1}^{n} \left[\varPhi\left(\frac{e - \widehat{\varepsilon}_{i}}{a_n}\right) + \varPhi\left(\frac{e + \widehat{\varepsilon}_{i}}{a_n}\right)\right], \end{aligned} $$

where ϕ(⋅) and Φ(⋅) stand for the density and the distribution function of Z i’s, which are assumed to be normally distributed with zero mean and unit variance. It is easy to verify that G (⋅) is continuous, symmetric, and moreover, it satisfies Assumption A.2. Thus, for E being the conditional expectation operator when conditioning by the initial data sample, we obtain the following:

$$\displaystyle \begin{aligned} E^{\star} M_{n}^{\star}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}} \sum_{i = 1}^n E^{\star}\Big[ \psi \big( \widehat{\sigma}(X_{i})\varepsilon_{i} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \big) \Big] \cdot K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x)\\ & = \frac{ - 1}{\sqrt{n h_{n}}} \sum_{i = 1}^n \lambda_{G^\star}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg)\cdot K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$

where we used the symmetric property of the distribution function G . Next, we obtain

$$\displaystyle \begin{aligned} \lambda_{G^\star}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) & = - \int_{\mathbb{R}} \psi \bigg( \widehat{\sigma}(X_{i}) e - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) \text{d}G^\star(e)\\ & = \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) {}\\ & - \int_{\mathbb{R}} \psi \bigg( \widehat{\sigma}(X_{i}) e - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) \text{d}(G^\star - G)(e), \end{aligned} $$
(9)

where the last term can be shown to be asymptotically negligible due to the properties of ψ(⋅) and the fact that \(\sup _{x \in \mathbb {R}} |G^{\star }(x) - G(x)| \to 0\) in probability (see Lemma 2.19 in [14]). For (9) we can use the Hölder property of λ G(⋅) (Assumption A.4) and we get that

$$\displaystyle \begin{aligned} & \bigg|\lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) - \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(X_{i}) \bigg)\bigg| = o(1), \end{aligned} $$

and

$$\displaystyle \begin{aligned} & \bigg|\lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(X_{i}) \bigg) - \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(x) \bigg)\bigg| = o(1), \end{aligned} $$

where the first equation follows from the fact that \(\widehat {\sigma }(X_{i})\) is a consistent estimate of σ(X i) and the second follows from the fact that |X i − x|≤ h n. Both equations hold almost surely.

Putting everything together we obtain that

$$\displaystyle \begin{aligned} E^{\star} M_{n}^{\star}(\boldsymbol{b}, \ell) = E M_{n}(\boldsymbol{b}, \ell) + o_{P}(1), \end{aligned} $$

and, moreover, repeating the same steps also for the second moment \(E^{\star } \big [M_{n}^{\star }(\boldsymbol {b}, \ell )\big ]^2\) and applying (3) and (4) we also obtain that \(E^{\star } \big [M_{n}^{\star }(\boldsymbol {b}, \ell )\big ]^2 \to 0\) in probability.

To finish the proof we need the following lemma.

Lemma 1

Let the model in (1) hold and let Assumptions A.1–A.5 be all satisfied. Then the following holds:

$$\displaystyle \begin{aligned} \sup_{\|\boldsymbol{b}\| \leq C} \left| M_n^\star(\delta_{n} \boldsymbol{b}, \ell) + \frac{\delta_{n}}{\sqrt{n h_{n}}}\right.&\left. \lambda_{G}^{\prime}(0, \sigma(x)) \sum_{i = 1}^{n} \left(\frac{X_{i} - x}{h_{n}}\right)^{\ell} \right.\times\\ & \times \left. \left( \sum_{j = 0}^{p} b_{j} \left( \frac{X_{i} - x}{h_{n}} \right)^{j} \right) \cdot K\left(\frac{X_{i} - x}{h_{n}}\right) \right| = o_{P}\left(1\right), \end{aligned} $$

where ℓ = 0, …, p, C > 0,b∥ = |b 0| + ⋯ + |b p|, δ n = (nh n)γ∕2, and γ ∈ (γ 0, 1], for some 0 < γ 0 ≤ 1.

Proof

Lemma 1 is a bootstrap version of Lemma 4 in [11] or a more general Lemma A.3 in [7]. The proof follows the same lines using the moment properties derived for \( M_n^\star (\delta _{n} \boldsymbol {b}, \ell )\). □

Lemma 4 in [11] allows us to express the classical M-smoother estimates \(\widehat {\beta }_{x}\) in terms of the asymptotic Bahadur representations as

$$\displaystyle \begin{aligned} \frac{1}{(n h_{n})^{1/2}} \widehat{\beta}_{x} = \frac{(n h_{n})^{- 1/2}}{\lambda_{G}^{\prime}(0, \sigma(x))} \cdot \bigg(X_{n}^\top W_n X_{n}\bigg)^{-1} \cdot X_{n}^\top W_{n} \left(\begin{array}{c} \psi(\sigma(X_1) \varepsilon_{1})\\ \vdots\\ \psi(\sigma(X_n) \varepsilon_{n}) \end{array}\right) + o_{P}(1), \end{aligned} $$

while Lemma 1 allows us to express the bootstrapped counterparts \(\widehat {\beta }_{x}^\star \) in a similar manner as

$$\displaystyle \begin{aligned} \frac{1}{(n h_{n})^{1/2}} \widehat{\beta}_{x}^{\star} = \frac{(n h_{n})^{- 1/2}}{\lambda_{G}^{\prime}(0, \sigma(x))} \cdot \bigg(X_{n}^\top W_n X_{n}\bigg)^{-1} \cdot X_{n}^\top W_{n} \left(\begin{array}{c} \psi(\widehat{\sigma}(X_{1}) \varepsilon_{1}^\star)\\ \vdots\\ \psi(\widehat{\sigma}(X_{n}) \varepsilon_{n}^\star) \end{array}\right) + o_{P}(1), \end{aligned} $$

where \(W_{n} = Diag\Bigg \{K\bigg (\frac {X_{1} - x}{h_{n}}\bigg ), \dots , K\bigg (\frac {X_{n} - x}{h_{n}}\bigg )\Bigg \}\), and \(X_{n} =\bigg (\big ( \frac {X_{i} - x}{h_{n}}\big )^j\bigg )_{i = 1, j = 0}^{n, p}\).

To finish the proof one just needs to realize that the sequences of random variables \(\{\xi _{n i}\}_{i = 1}^n\) and \(\{\xi _{n i}^\star \}_{i = 1}^n\) for \(\xi _{n i} = \frac {1}{\sqrt {n h_{n}}} \psi (\sigma (X_{i})\varepsilon _{i}) \big (\frac {X_{i} - x}{h_{n}}\big )^\ell K\big (\frac {X_{i} - x}{h_{n}}\big )\) and \(\xi _{n i}^\star = \frac {1}{\sqrt {n h_{n}}} \psi (\widehat {\sigma }(X_{i})\varepsilon _{i}^\star ) \big (\frac {X_{i} - x}{h_{n}}\big )^\ell K\big (\frac {X_{i} - x}{h_{n}}\big )\) both comply with the assumptions of the central limit theorem for triangular schemes and thus, random quantities \(\sum _{i = 1}^n \xi _{n i}\) and \(\sum _{i = 1}^n \xi _{n i}^\star \) both converge in distribution, conditionally on X i’s and the original data {(X i, Y i); i = 1, …, n}, respectively, to the normal distribution with zero mean and the same variance parameter. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maciak, M. (2018). Bootstrapping Nonparametric M-Smoothers with Independent Error Terms. In: Bertail, P., Blanke, D., Cornillon, PA., Matzner-Løber, E. (eds) Nonparametric Statistics. ISNPS 2016. Springer Proceedings in Mathematics & Statistics, vol 250. Springer, Cham. https://doi.org/10.1007/978-3-319-96941-1_16

Download citation

Publish with us

Policies and ethics