Bootstrapping Nonparametric M-Smoothers with Independent Error Terms

Maciak, Matúš

doi:10.1007/978-3-319-96941-1_16

Matúš Maciak⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 250))

Included in the following conference series:

Conference of the International Society for Non-Parametric Statistics

1066 Accesses

Abstract

On the one hand, nonparametric regression approaches are flexible modeling tools in modern statistics. On the other hand, the lack of any parameters makes these approaches more challenging when assessing some statistical inference in these models. This is crucial especially in situations when one needs to perform some statistical tests or to construct some confidence sets. In such cases, it is common to use a bootstrap approximation instead. It is an effective alternative to more straightforward but rather slow plug-in techniques. In this contribution, we introduce a proper bootstrap algorithm for a robustified version of the nonparametric estimates, the so-called M-smoothers or M-estimates, respectively. We distinguish situations for homoscedastic and heteroscedastic independent error terms, and we prove the consistency of the bootstrap approximation under both scenarios. Technical proofs are provided and the finite sample properties are investigated via a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antoch, J., & Janssen P. (1989). Nonparametric regression m-quantiles. Statistics & Probability Letters, 8, 355–362.
Google Scholar
Boente, G., Ruiz, M., & Zamar, R. (2010). On a robust local estimator for the scale function in heteroscedastic nonparametric regression. Statistics & Probability Letters, 80, 1185–1195.
Google Scholar
Fan, J., & Gijbels, I. (1995). Local polynomial modelling and its applications (1st ed.). Boca Raton, FL: Chapman & Hall
Google Scholar
Hall, P., Kay, J., & Titterington, D. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrica, 77, 521–528.
Google Scholar
Härdle, W., & Gasser, T. (1984). Robust nonparametric function fitting. Journal of the Royal Statistical Society, Series B, 46, 42–51.
Google Scholar
Härdle, W. K., & Marron, J. S. (1991). MBootstrap simultaneous error bars for nonparametric regression. The Annals of Statistics, 19(2), 778–796.
Google Scholar
Hušková, M., & Maciak, M. (2017). Discontinuities in robust nonparametric regression with α-mixing dependence. Journal Nonparametric Statistics, 29(2), 447–475.
Google Scholar
Hwang, R. C. (2002). A new version of the local constant M-smoother. Communications in Statistics: Theory and Methods, 31, 833–848
Google Scholar
Leung, D. (2005). Cross-validation in nonparametric regression with outliers. Annals of Statistics, 33, 2291–2310.
Google Scholar
Leung, D., Marriott, F., & Wu, E. (1993). Bandwidth selection in robust smoothing. Journal of Nonparametric Statistics, 2, 333–339.
Google Scholar
Maciak, M. (2011). Flexibility, Robustness, and Discontinuity in Nonparametric Regression Approaches. Ph.D. thesis, Charles University, Prague.
Google Scholar
Müller, H., & Stadtmüller, U. (1987). Estimation of heteroscedasticity in regression analysis. Annals of Statistics, 15, 610–625.
Google Scholar
Nadaraya, E. (1964). On estimating regression. Theory Probability Applications, 9, 141–142.
Google Scholar
Neumeyer, N. (2006). Bootstrap Procedures for Empirical Processes of Nonparametric Residuals. Habilitationsschrift, Ruhr-University Bochum, Germany.
Google Scholar
Rue, H., Chu, C., Godtliebsen, F., & Marron, J. (1998). M-smoother with local linear fit. Journal of Nonparametric Statistics, 14, 155–168.
Google Scholar
Serfling, R. (1980). Approximation theorems of mathematical statistics. New York: Wiley.
Google Scholar
Watson, G. (1964). Smooth regression analysis. The Indian Journal of Statistics, Series A, 26, 359–372.
Google Scholar

Download references

Acknowledgements

The author’s research was partly supported by the Grant P402/12/G097.

Author information

Authors and Affiliations

Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Matúš Maciak

Authors

Matúš Maciak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matúš Maciak .

Editor information

Editors and Affiliations

MODAL’X, Paris West University Nanterre La Défense, Nanterre, France
Patrice Bertail
LMA, Avignon University, Avignon, France
Delphine Blanke
MIASHS, University of Rennes 2, Rennes, France
Pierre-André Cornillon
Formation Continue CEPE, Ecole Nationale de la Statistique et de l’Administration, Malakoff, France
Eric Matzner-Løber

Appendix

In this section we provide some technical details and the proof of the bootstrap consistency result stated in Theorem 2. Let $\{(X_{i}, Y_{i}^{\star });~i = 1, \dots , n\}$ be the bootstrapped data where $Y_{i}^{\star } = \widehat {m}(X_{i}) + \widehat {\sigma }(X_{i})\varepsilon _{i}^{\star }$, for $\widehat {m}(X_{i})$ being the M-smoother estimate of m(X _i), $\widehat {\sigma }(X_{i})$ the estimate of σ(X _i) in sense of (5), and the random error terms $\{\varepsilon _{i}^{\star }\}_{i =1 }^{n}$ are defined in B3 step of the bootstrap algorithm in Sect. 3. Then, we can obtain the bootstrapped version of $\widehat {m}(x)$, for some x ∈ (0, 1), given as a solution of the minimization problem

$$\displaystyle \begin{aligned} \widehat{\boldsymbol{\beta}}_{x}^{\star} = \operatorname*{Argmin}_{(b_{0}, \dots, b_{p})^{\top} \in \mathbb{R}^{p + 1}} \sum_{i = 1}^{n} \rho \bigg( Y_{i}^{\star} - \sum_{j = 0}^{p} b_{j} (X_{i} - x)^{j} \bigg) \cdot K \left(\frac{X_{i} - x}{h_{n}}\right), {} \end{aligned} $$

(6)

where $\widehat {\boldsymbol {\beta }}_{x}^{\star } = (\widehat {\beta }_{0}^{\star }, \dots , \widehat {\beta }_{p}^{\star })^{\top }$, and $\widehat {m}^{\star }(x) = \widehat {\beta }_{0}^{\star }$. Using the smoothness property of m(⋅) we can apply the Taylor expansion of the order p and given the model definition in (1) we can rewrite the minimization problem as an equivalent problem given by the following set of equations:

$$\displaystyle \begin{aligned} \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star} - \sum_{j = 0}^{p} b_{j} \Big(\frac{X_{i} - x}{h_{n}}\Big)^{j} \bigg) \cdot \left(\frac{X_{i} - x}{h_{n}}\right)^{\ell} K \left(\frac{X_{i} - x}{h_{n}}\right) = 0, {} \end{aligned}$$

for ℓ = 0, …, p, where ψ = ρ′. Next, for any ℓ ∈{0, …, p} and $\boldsymbol {b} \in \mathbb {R}^{p + 1}$ let us define an empirical process M _n(b, ℓ) and its bootstrap counterpart $M_{n}^{\star }(\boldsymbol {b}, \ell )$ as follows:

$$\displaystyle \begin{aligned} M_{n}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \Bigg[\psi \bigg( \sigma(X_{i})\varepsilon_{i} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) - \psi \bigg( \sigma(X_{i})\varepsilon_{i}\bigg)\Bigg]K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$

(7)

and

$$\displaystyle \begin{aligned} M_{n}^{\star}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}}\sum_{i = 1}^{n} \Bigg[\psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) - \psi \bigg( \widehat{\sigma}(X_{i})\varepsilon_{i}^{\star}\bigg)\Bigg]K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$

(8)

where for brevity we used the notation $\xi _{i}^{\ell }(x) = \Big (\frac {X_{i} - x}{h_{n}}\Big )^{\ell }$. We need to investigate the behavior of $M_{n}^{\star }(\boldsymbol {b}, \ell )$, conditionally on the sample {(X _i, Y _i); i = 1, …, n)}, and we will compare it with the behavior of M _n(b, ℓ).

Let G ^⋆(⋅) be the distribution function of the bootstrap residuals $\{\varepsilon _{i}^{\star }\}_{i = 1}^{n}$ defined in B3. It follows from the definition that

$$\displaystyle \begin{aligned} G^{\star}(e) & = P^{\star}[\varepsilon_{i}^{\star} \leq e] = P^{\star}[V_{i}\cdot\tilde{\varepsilon}_{i} + a_{n}\cdot Z_{i} \leq e] \\ & = \frac{1}{2n}\Bigg[\int_{\mathbb{R}} \sum_{i = 1}^{n}\mathbb{I}_{\{\widehat{\varepsilon}_{i} \leq e - a_n u\}} \phi(u)\text{d}u + \int_{\mathbb{R}} \sum_{i = 1}^{n}\mathbb{I}_{\{\widehat{\varepsilon}_{i} \geq a_n u - e\}} \phi(u)\text{d}u\Bigg]\\ & = \frac{1}{2n} \sum_{i = 1}^{n} \left[\varPhi\left(\frac{e - \widehat{\varepsilon}_{i}}{a_n}\right) + \varPhi\left(\frac{e + \widehat{\varepsilon}_{i}}{a_n}\right)\right], \end{aligned} $$

where ϕ(⋅) and Φ(⋅) stand for the density and the distribution function of Z _i’s, which are assumed to be normally distributed with zero mean and unit variance. It is easy to verify that G ^⋆(⋅) is continuous, symmetric, and moreover, it satisfies Assumption A.2. Thus, for E ^⋆ being the conditional expectation operator when conditioning by the initial data sample, we obtain the following:

$$\displaystyle \begin{aligned} E^{\star} M_{n}^{\star}(\boldsymbol{b}, \ell) & = \frac{1}{\sqrt{n h_{n}}} \sum_{i = 1}^n E^{\star}\Big[ \psi \big( \widehat{\sigma}(X_{i})\varepsilon_{i} - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \big) \Big] \cdot K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x)\\ & = \frac{ - 1}{\sqrt{n h_{n}}} \sum_{i = 1}^n \lambda_{G^\star}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg)\cdot K(\xi_{i}^1(x)) \xi_{i}^{\ell}(x), \end{aligned} $$

where we used the symmetric property of the distribution function G ^⋆. Next, we obtain

$$\displaystyle \begin{aligned} \lambda_{G^\star}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) & = - \int_{\mathbb{R}} \psi \bigg( \widehat{\sigma}(X_{i}) e - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) \text{d}G^\star(e)\\ & = \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) {}\\ & - \int_{\mathbb{R}} \psi \bigg( \widehat{\sigma}(X_{i}) e - \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x) \bigg) \text{d}(G^\star - G)(e), \end{aligned} $$

(9)

where the last term can be shown to be asymptotically negligible due to the properties of ψ(⋅) and the fact that $\sup _{x \in \mathbb {R}} |G^{\star }(x) - G(x)| \to 0$ in probability (see Lemma 2.19 in [14]). For (9) we can use the Hölder property of λ _G(⋅) (Assumption A.4) and we get that

$$\displaystyle \begin{aligned} & \bigg|\lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \widehat{\sigma}(X_{i}) \bigg) - \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(X_{i}) \bigg)\bigg| = o(1), \end{aligned} $$

and

$$\displaystyle \begin{aligned} & \bigg|\lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(X_{i}) \bigg) - \lambda_{G}\bigg( \sum_{j = 0}^{p} b_{j} \xi_{i}^{j}(x), \sigma(x) \bigg)\bigg| = o(1), \end{aligned} $$

where the first equation follows from the fact that $\widehat {\sigma }(X_{i})$ is a consistent estimate of σ(X _i) and the second follows from the fact that |X _i − x|≤ h _n. Both equations hold almost surely.

Putting everything together we obtain that

$$\displaystyle \begin{aligned} E^{\star} M_{n}^{\star}(\boldsymbol{b}, \ell) = E M_{n}(\boldsymbol{b}, \ell) + o_{P}(1), \end{aligned} $$

and, moreover, repeating the same steps also for the second moment $E^{\star } \big [M_{n}^{\star }(\boldsymbol {b}, \ell )\big ]^2$ and applying (3) and (4) we also obtain that $E^{\star } \big [M_{n}^{\star }(\boldsymbol {b}, \ell )\big ]^2 \to 0$ in probability.

To finish the proof we need the following lemma.

Lemma 1

Let the model in (1) hold and let Assumptions A.1–A.5 be all satisfied. Then the following holds:

$$\displaystyle \begin{aligned} \sup_{\|\boldsymbol{b}\| \leq C} \left| M_n^\star(\delta_{n} \boldsymbol{b}, \ell) + \frac{\delta_{n}}{\sqrt{n h_{n}}}\right.&\left. \lambda_{G}^{\prime}(0, \sigma(x)) \sum_{i = 1}^{n} \left(\frac{X_{i} - x}{h_{n}}\right)^{\ell} \right.\times\\ & \times \left. \left( \sum_{j = 0}^{p} b_{j} \left( \frac{X_{i} - x}{h_{n}} \right)^{j} \right) \cdot K\left(\frac{X_{i} - x}{h_{n}}\right) \right| = o_{P}\left(1\right), \end{aligned} $$

where ℓ = 0, …, p, C > 0, ∥b∥ = |b ₀| + ⋯ + |b _p|, δ _n = (nh _n)^−γ∕2, and γ ∈ (γ ₀, 1], for some 0 < γ ₀ ≤ 1.

Proof

Lemma 1 is a bootstrap version of Lemma 4 in [11] or a more general Lemma A.3 in [7]. The proof follows the same lines using the moment properties derived for $ M_n^\star (\delta _{n} \boldsymbol {b}, \ell )$. □

Lemma 4 in [11] allows us to express the classical M-smoother estimates $\widehat {\beta }_{x}$ in terms of the asymptotic Bahadur representations as

$$\displaystyle \begin{aligned} \frac{1}{(n h_{n})^{1/2}} \widehat{\beta}_{x} = \frac{(n h_{n})^{- 1/2}}{\lambda_{G}^{\prime}(0, \sigma(x))} \cdot \bigg(X_{n}^\top W_n X_{n}\bigg)^{-1} \cdot X_{n}^\top W_{n} \left(\begin{array}{c} \psi(\sigma(X_1) \varepsilon_{1})\\ \vdots\\ \psi(\sigma(X_n) \varepsilon_{n}) \end{array}\right) + o_{P}(1), \end{aligned} $$

while Lemma 1 allows us to express the bootstrapped counterparts $\widehat {\beta }_{x}^\star $ in a similar manner as

$$\displaystyle \begin{aligned} \frac{1}{(n h_{n})^{1/2}} \widehat{\beta}_{x}^{\star} = \frac{(n h_{n})^{- 1/2}}{\lambda_{G}^{\prime}(0, \sigma(x))} \cdot \bigg(X_{n}^\top W_n X_{n}\bigg)^{-1} \cdot X_{n}^\top W_{n} \left(\begin{array}{c} \psi(\widehat{\sigma}(X_{1}) \varepsilon_{1}^\star)\\ \vdots\\ \psi(\widehat{\sigma}(X_{n}) \varepsilon_{n}^\star) \end{array}\right) + o_{P}(1), \end{aligned} $$

where $W_{n} = Diag\Bigg \{K\bigg (\frac {X_{1} - x}{h_{n}}\bigg ), \dots , K\bigg (\frac {X_{n} - x}{h_{n}}\bigg )\Bigg \}$, and $X_{n} =\bigg (\big ( \frac {X_{i} - x}{h_{n}}\big )^j\bigg )_{i = 1, j = 0}^{n, p}$.

To finish the proof one just needs to realize that the sequences of random variables $\{\xi _{n i}\}_{i = 1}^n$ and $\{\xi _{n i}^\star \}_{i = 1}^n$ for $\xi _{n i} = \frac {1}{\sqrt {n h_{n}}} \psi (\sigma (X_{i})\varepsilon _{i}) \big (\frac {X_{i} - x}{h_{n}}\big )^\ell K\big (\frac {X_{i} - x}{h_{n}}\big )$ and $\xi _{n i}^\star = \frac {1}{\sqrt {n h_{n}}} \psi (\widehat {\sigma }(X_{i})\varepsilon _{i}^\star ) \big (\frac {X_{i} - x}{h_{n}}\big )^\ell K\big (\frac {X_{i} - x}{h_{n}}\big )$ both comply with the assumptions of the central limit theorem for triangular schemes and thus, random quantities $\sum _{i = 1}^n \xi _{n i}$ and $\sum _{i = 1}^n \xi _{n i}^\star $ both converge in distribution, conditionally on X _i’s and the original data {(X _i, Y _i); i = 1, …, n}, respectively, to the normal distribution with zero mean and the same variance parameter. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maciak, M. (2018). Bootstrapping Nonparametric M-Smoothers with Independent Error Terms. In: Bertail, P., Blanke, D., Cornillon, PA., Matzner-Løber, E. (eds) Nonparametric Statistics. ISNPS 2016. Springer Proceedings in Mathematics & Statistics, vol 250. Springer, Cham. https://doi.org/10.1007/978-3-319-96941-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-96941-1_16
Published: 09 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96940-4
Online ISBN: 978-3-319-96941-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Bootstrapping Nonparametric M-Smoothers with Independent Error Terms

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation