Fully robust one-sided cross-validation for regression functions

Abstract

Fully robust OSCV is a modification of the OSCV method that produces consistent bandwidths in the cases of smooth and nonsmooth regression functions. We propose the practical implementation of the method based on the robust cross-validation kernel \(H_I\) in the case when the Gaussian kernel \(\phi \) is used in computing the resulting regression estimate. The kernel \(H_I\) produces practically unbiased bandwidths in the smooth and nonsmooth cases and performs adequately in the data examples. Negative tails of \(H_I\) occasionally result in unacceptably wiggly OSCV curves in the neighborhood of zero. This problem can be resolved by selecting the bandwidth from the largest local minimum of the curve. Further search for the robust kernels with desired properties brought us to consider the quartic kernel for the cross-validation purposes. The quartic kernel is almost robust in the sense that in the nonsmooth case it substantially reduces the asymptotic relative bandwidth bias compared to \(\phi \). However, the quartic kernel is found to produce more variable bandwidths compared to \(\phi \). Nevertheless, the quartic kernel has an advantage of producing smoother OSCV curves compared to \(H_I\). A simplified scale-free version of the OSCV method based on a rescaled one-sided kernel is proposed.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836 (ISSN 0003-1291)

    MathSciNet  Article  MATH  Google Scholar 

  2. Dudzinski M, Mykytowycz R (1961) The eye lens as an indicator of age in the wild rabbit in Australia. CSIRO Wildl Res 6:156–159

    Article  Google Scholar 

  3. Gasser T, Müller H-G (1979) Kernel estimation of regression functions. In: Smoothing techniques for curve estimation, Proceedings Workshop, Heidelberg, volume 757 of Lecture Notes in Mathematics. Springer, Berlin, pp 23–68

  4. Hall P, Marron JS (1991) Local minima in cross-validation functions. J R Stat Soc Ser B 53(1):245–252

    MathSciNet  MATH  Google Scholar 

  5. Hart JD, Lee C-L (2005) Robustness of one-sided cross-validation to autocorrelation. J Multivar Anal 92(1):77–96 (ISSN 0047-259X)

    MathSciNet  Article  MATH  Google Scholar 

  6. Hart JD, Yi S (1998) One-sided cross-validation. J Am Stat Assoc 93(442):620–631

    MathSciNet  Article  MATH  Google Scholar 

  7. Köhler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274

    MathSciNet  Article  Google Scholar 

  8. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  9. Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2011) Do-validation for kernel density estimation. J Am Stat Assoc 106(494):651–660. doi:10.1198/jasa.2011.tm08687 (ISSN 0162-1459)

    MathSciNet  Article  MATH  Google Scholar 

  10. Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2014) Further theoretical and practical insight to the do-validated bandwidth selector. J Korean Stat Soc 43(3):355–365

    MathSciNet  Article  MATH  Google Scholar 

  11. Martínez-Miranda MD, Nielsen JP, Sperlich S (2009) One sided cross validation for density estimation. In: Gregoriou GN (ed) Operational risk towards basel III: best practices and issues in modeling, management and regulation. Wiley, Hoboken, pp 177–196

    Google Scholar 

  12. Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90(432):1257–1270 (ISSN 0162-1459)

    MathSciNet  Article  MATH  Google Scholar 

  13. Savchuk OY, Hart JD, Sheather SJ (2010) Indirect cross-validation for density estimation. J Am Stat Assoc 105(489):415–423

    MathSciNet  Article  MATH  Google Scholar 

  14. Savchuk OY, Hart JD, Sheather SJ (2013) One-sided cross-validation for nonsmooth regression functions. J Nonparametr Stat 25(4):889–904

    MathSciNet  Article  MATH  Google Scholar 

  15. Savchuk OY, Hart JD, Sheather SJ (2016) Corrigendum to “One-sided cross-validation for nonsmooth regression functions” [J Nonparametr Stat 25(4):889–904 (2013)]. J Nonparametr Stat 28(4):875–877

  16. Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5(4):595–645 (ISSN 0090-5364. With discussion and a reply by the author)

    MathSciNet  Article  MATH  Google Scholar 

  17. Wand MP, Jones MC (1995) Kernel smoothing, volume 60 of Monographs on statistics and applied probability. Chapman and Hall Ltd., London

    Google Scholar 

  18. Weisberg S (2005) Applied linear regression, 3rd edn. Wiley series in probability and statistics. Wiley-Interscience, Hoboken. doi:10.1002/0471704091 (ISBN 0-471-66379-4)

    Google Scholar 

  19. Yi S (1996) On one-sided cross-validation in nonparametric regression. Ph.D dissertation, Texas A&M University

  20. Yi S (2005) A comparison of two bandwidth selectors OSCV and AICc in nonparametric regression. Commun Stat Simul Comput 34(3):585–594 (ISSN 0361-0918)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Olga Y. Savchuk.

Appendix

Appendix

Notation For an arbitrary function g, define the following functionals:

$$\begin{aligned}&\begin{array}{l} X_g(u)=\int _{-\infty }^\infty g(x)\cos (2\pi ux)\,dx,\\ Y_g(u)=\int _{-\infty }^\infty g(x)\sin (2\pi ux)\,dx, \end{array}\nonumber \\&I_g=\int _0^{\infty } u^2\left[ X_g^{\prime }(u)(X_g(u)-1)+Y_g^{\prime }(u)Y_g(u)\right] ^2\,du, \end{aligned}$$
(18)
$$\begin{aligned}&B_g=\int _0^\infty \left\{ z\bigl (1-D_g(z)\bigr )+G_g(z)\right\} ^2\,dz+\int _0^\infty \left\{ zD_g(-z)+G_g(-z)\right\} ^2\,dz,\nonumber \\ \end{aligned}$$
(19)

where for all z,

$$\begin{aligned} \begin{array}{l} \displaystyle {D_g(z)=\int _{-\infty }^z g(u)\,du},\\ \displaystyle {G_g(z)=\int _{-\infty }^z ug(u)\,du}. \end{array} \end{aligned}$$

Regression functions Regression functions \(r_1\), \(r_2\), and \(r_3\) are defined below. For each function, \(0\le x\le 1\).

$$\begin{aligned} \begin{array}{l} \displaystyle {r_1(x)=5x^{10}(1-x)^2+2.5x^2(1-x)^{10},}\\ \\ \displaystyle {r_2(x)={\left\{ \begin{array}{ll} 0.0125-0.05|x-0.25|,\qquad 0\le x\le 0.5,\\ 0.05|x-0.75|-0.0125,\qquad 0.5<x\le 1. \end{array}\right. }}\\ \\ \displaystyle { r_3(x)=\left\{ \begin{array}{ll} 0.047619\sqrt{x},&{}\quad 0\le x<0.1,\\ 0.035186e^{-20x}+0.010297,&{}\quad 0.1\le x<0.3,\\ 0.142857x-0.032473,&{}\quad 0.3\le x<0.35,\\ 0.142857(x-0.35)(x-0.45)+0.017527,&{}\quad 0.35\le x<0.6,\\ 0.151455-0.214286x,&{}\quad 0.6\le x<0.7,\\ 0.001455-0.214286(x-0.7)^3(x-0.4),&{}\quad 0.7\le x<0.8,\\ 0.004762\ln (10x-7.9)+0.012334,&{}\quad 0.8\le x\le 1. \end{array}\right. } \end{array} \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Savchuk, O.Y., Hart, J.D. Fully robust one-sided cross-validation for regression functions. Comput Stat 32, 1003–1025 (2017). https://doi.org/10.1007/s00180-017-0713-7

Download citation

Keywords

  • Cross-validation
  • One-sided cross-validation
  • Local linear estimator
  • Bandwidth selection
  • Mean average squared error