Skip to main content
Log in

Fully robust one-sided cross-validation for regression functions

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Fully robust OSCV is a modification of the OSCV method that produces consistent bandwidths in the cases of smooth and nonsmooth regression functions. We propose the practical implementation of the method based on the robust cross-validation kernel \(H_I\) in the case when the Gaussian kernel \(\phi \) is used in computing the resulting regression estimate. The kernel \(H_I\) produces practically unbiased bandwidths in the smooth and nonsmooth cases and performs adequately in the data examples. Negative tails of \(H_I\) occasionally result in unacceptably wiggly OSCV curves in the neighborhood of zero. This problem can be resolved by selecting the bandwidth from the largest local minimum of the curve. Further search for the robust kernels with desired properties brought us to consider the quartic kernel for the cross-validation purposes. The quartic kernel is almost robust in the sense that in the nonsmooth case it substantially reduces the asymptotic relative bandwidth bias compared to \(\phi \). However, the quartic kernel is found to produce more variable bandwidths compared to \(\phi \). Nevertheless, the quartic kernel has an advantage of producing smoother OSCV curves compared to \(H_I\). A simplified scale-free version of the OSCV method based on a rescaled one-sided kernel is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836 (ISSN 0003-1291)

    Article  MathSciNet  MATH  Google Scholar 

  • Dudzinski M, Mykytowycz R (1961) The eye lens as an indicator of age in the wild rabbit in Australia. CSIRO Wildl Res 6:156–159

    Article  Google Scholar 

  • Gasser T, Müller H-G (1979) Kernel estimation of regression functions. In: Smoothing techniques for curve estimation, Proceedings Workshop, Heidelberg, volume 757 of Lecture Notes in Mathematics. Springer, Berlin, pp 23–68

  • Hall P, Marron JS (1991) Local minima in cross-validation functions. J R Stat Soc Ser B 53(1):245–252

    MathSciNet  MATH  Google Scholar 

  • Hart JD, Lee C-L (2005) Robustness of one-sided cross-validation to autocorrelation. J Multivar Anal 92(1):77–96 (ISSN 0047-259X)

    Article  MathSciNet  MATH  Google Scholar 

  • Hart JD, Yi S (1998) One-sided cross-validation. J Am Stat Assoc 93(442):620–631

    Article  MathSciNet  MATH  Google Scholar 

  • Köhler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274

    Article  MathSciNet  Google Scholar 

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2011) Do-validation for kernel density estimation. J Am Stat Assoc 106(494):651–660. doi:10.1198/jasa.2011.tm08687 (ISSN 0162-1459)

    Article  MathSciNet  MATH  Google Scholar 

  • Mammen E, Martínez Miranda MD, Nielsen JP, Sperlich S (2014) Further theoretical and practical insight to the do-validated bandwidth selector. J Korean Stat Soc 43(3):355–365

    Article  MathSciNet  MATH  Google Scholar 

  • Martínez-Miranda MD, Nielsen JP, Sperlich S (2009) One sided cross validation for density estimation. In: Gregoriou GN (ed) Operational risk towards basel III: best practices and issues in modeling, management and regulation. Wiley, Hoboken, pp 177–196

    Google Scholar 

  • Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90(432):1257–1270 (ISSN 0162-1459)

    Article  MathSciNet  MATH  Google Scholar 

  • Savchuk OY, Hart JD, Sheather SJ (2010) Indirect cross-validation for density estimation. J Am Stat Assoc 105(489):415–423

    Article  MathSciNet  MATH  Google Scholar 

  • Savchuk OY, Hart JD, Sheather SJ (2013) One-sided cross-validation for nonsmooth regression functions. J Nonparametr Stat 25(4):889–904

    Article  MathSciNet  MATH  Google Scholar 

  • Savchuk OY, Hart JD, Sheather SJ (2016) Corrigendum to “One-sided cross-validation for nonsmooth regression functions” [J Nonparametr Stat 25(4):889–904 (2013)]. J Nonparametr Stat 28(4):875–877

  • Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5(4):595–645 (ISSN 0090-5364. With discussion and a reply by the author)

    Article  MathSciNet  MATH  Google Scholar 

  • Wand MP, Jones MC (1995) Kernel smoothing, volume 60 of Monographs on statistics and applied probability. Chapman and Hall Ltd., London

    Book  Google Scholar 

  • Weisberg S (2005) Applied linear regression, 3rd edn. Wiley series in probability and statistics. Wiley-Interscience, Hoboken. doi:10.1002/0471704091 (ISBN 0-471-66379-4)

    Google Scholar 

  • Yi S (1996) On one-sided cross-validation in nonparametric regression. Ph.D dissertation, Texas A&M University

  • Yi S (2005) A comparison of two bandwidth selectors OSCV and AICc in nonparametric regression. Commun Stat Simul Comput 34(3):585–594 (ISSN 0361-0918)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Y. Savchuk.

Appendix

Appendix

Notation For an arbitrary function g, define the following functionals:

$$\begin{aligned}&\begin{array}{l} X_g(u)=\int _{-\infty }^\infty g(x)\cos (2\pi ux)\,dx,\\ Y_g(u)=\int _{-\infty }^\infty g(x)\sin (2\pi ux)\,dx, \end{array}\nonumber \\&I_g=\int _0^{\infty } u^2\left[ X_g^{\prime }(u)(X_g(u)-1)+Y_g^{\prime }(u)Y_g(u)\right] ^2\,du, \end{aligned}$$
(18)
$$\begin{aligned}&B_g=\int _0^\infty \left\{ z\bigl (1-D_g(z)\bigr )+G_g(z)\right\} ^2\,dz+\int _0^\infty \left\{ zD_g(-z)+G_g(-z)\right\} ^2\,dz,\nonumber \\ \end{aligned}$$
(19)

where for all z,

$$\begin{aligned} \begin{array}{l} \displaystyle {D_g(z)=\int _{-\infty }^z g(u)\,du},\\ \displaystyle {G_g(z)=\int _{-\infty }^z ug(u)\,du}. \end{array} \end{aligned}$$

Regression functions Regression functions \(r_1\), \(r_2\), and \(r_3\) are defined below. For each function, \(0\le x\le 1\).

$$\begin{aligned} \begin{array}{l} \displaystyle {r_1(x)=5x^{10}(1-x)^2+2.5x^2(1-x)^{10},}\\ \\ \displaystyle {r_2(x)={\left\{ \begin{array}{ll} 0.0125-0.05|x-0.25|,\qquad 0\le x\le 0.5,\\ 0.05|x-0.75|-0.0125,\qquad 0.5<x\le 1. \end{array}\right. }}\\ \\ \displaystyle { r_3(x)=\left\{ \begin{array}{ll} 0.047619\sqrt{x},&{}\quad 0\le x<0.1,\\ 0.035186e^{-20x}+0.010297,&{}\quad 0.1\le x<0.3,\\ 0.142857x-0.032473,&{}\quad 0.3\le x<0.35,\\ 0.142857(x-0.35)(x-0.45)+0.017527,&{}\quad 0.35\le x<0.6,\\ 0.151455-0.214286x,&{}\quad 0.6\le x<0.7,\\ 0.001455-0.214286(x-0.7)^3(x-0.4),&{}\quad 0.7\le x<0.8,\\ 0.004762\ln (10x-7.9)+0.012334,&{}\quad 0.8\le x\le 1. \end{array}\right. } \end{array} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchuk, O.Y., Hart, J.D. Fully robust one-sided cross-validation for regression functions. Comput Stat 32, 1003–1025 (2017). https://doi.org/10.1007/s00180-017-0713-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0713-7

Keywords

Navigation