Skip to main content
Log in

Bandwidth matrix selectors for kernel regression

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Choosing a bandwidth matrix belongs to the class of significant problems in multivariate kernel regression. The problem consists of the fact that a theoretical optimal bandwidth matrix depends on the unknown regression function which to be estimated. Thus data-driven methods should be applied. A method proposed here is based on a relation between asymptotic integrated square bias and asymptotic integrated variance. Statistical properties of this method are also treated. The last two sections are devoted to simulations and an application to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aldershof B, Marron J, Park B, Wand M (1995) Facts about the gaussian probability density function. Appl Anal 59:289–306

  • Chacón JE, Duong T, Wand MP (2011) Asymptotics for general multivariate kernel density derivative estimators. Stat Sin 21(2):807–840

    Article  MathSciNet  MATH  Google Scholar 

  • Chiu S (1990) Why bandwidth selectors tend to choose smaller bandwidths, and a remedy. Biometrika 77(1):222–226

    Article  MathSciNet  MATH  Google Scholar 

  • Chiu S (1991) Some stabilized bandwidth selectors for nonparametric regression. Ann Stat 19(3):1528–1546

    Article  MathSciNet  MATH  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31(4):377–403

    Article  MathSciNet  MATH  Google Scholar 

  • Droge B (1996) Some comments on cross-validation. Tech. Rep. 1994-7, Humboldt Universitaet Berlin. http://ideas.repec.org/p/wop/humbsf/1994-7.html

  • Duong T, Hazelton M (2005a) Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J Multivar Anal 93(2):417–433

  • Duong T, Hazelton M (2005b) Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 32(3):485–506

  • Fan J (1993) Local linear regression smoothers and their minimax efficiencies. Ann Stat 21(1):196–216

    Article  MathSciNet  MATH  Google Scholar 

  • Gasser T, Müller HG (1979) Kernel estimation of regression functions. In: Gasser T, Rosenblatt M (eds) Smoothing techniques for curve estimation. Lecture Notes in Mathematics, vol 757. Springer, Berlin, pp 23–68

    Chapter  Google Scholar 

  • Härdle W (1990) Applied nonparametric regression, 1st edn. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Härdle W (2004) Nonparametric and semiparametric models. Springer, Berlin

    Book  MATH  Google Scholar 

  • Herrmann E, Engel J, Wand M, Gasser T (1995) A bandwidth selector for bivariate kernel regression. J R Stat Soc Ser B (Methodological) 57:171–180

  • Horová I, Koláček J, Vopatová K (2013) Full bandwidth matrix selectors for gradient kernel density estimate. Comput Stat Data Anal 57(1):364–376

    Article  MathSciNet  MATH  Google Scholar 

  • Horová I, Zelinka J (2007) Contribution to the bandwidth choice for kernel density estimates. Comput Stat 22(1):31–47

    Article  MathSciNet  MATH  Google Scholar 

  • Jones MC, Kappenman RF (1991) On a class of kernel density estimate bandwidth selectors. Scand J Stat 19(4):337–349

    MathSciNet  MATH  Google Scholar 

  • Jones MC, Marron JS, Park BU (1991) A simple root n bandwidth selector. Ann Stat 19(4):1919–1932

    Article  MathSciNet  MATH  Google Scholar 

  • Koláček J (2005) Kernel estimation of the regression function (in Czech). Ph.D. thesis, Masaryk University, Brno

  • Koláček J (2008) Plug-in method for nonparametric regression. Comput Stat 23(1):63–78

    Article  MathSciNet  Google Scholar 

  • Koláček J, Horová I (2016) Selection of bandwidth for kernel regression. Commun Stat Theory Methods 45(5):1487–1500

  • Khler M, Schindler A, Sperlich S (2014) A review and comparison of bandwidth selection methods for kernel regression. Int Stat Rev 82(2):243–274

    Article  MathSciNet  Google Scholar 

  • Lafferty J, Wasserman L (2008) Rodeo: sparse, greedy nonparametric regression. Ann Stat 36:28–63

  • Lau G, Ooi PL, Phoon B (1998) Fatal falls from a height: the use of mathematical models to estimate the height of fall from the injuries sustained. Forensic Sci Int 93(1):33–44

    Article  Google Scholar 

  • Magnus JR, Neudecker H (1979) Commutation matrix: some properties and application. Ann Stat 7(2):381–394

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Manteiga WG, Miranda MM, González AP (2004) The choice of smoothing parameter in nonparametric regression through wild bootstrap. Comput Stat Data Anal 47(3):487–515

    Article  MathSciNet  MATH  Google Scholar 

  • Rice J (1984) Bandwidth choice for nonparametric regression. Ann Stat 12(4):1215–1230

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert D (1997) Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J Am Stat Assoc 92(439):1049–1062

    Article  MathSciNet  MATH  Google Scholar 

  • Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370

  • Seifert B, Gasser T (1996) Variance properties of local polynomials and ensuing modifications. In: Härdle W, Schimek M (eds) Statistical theory and computational aspects of smoothing, contributions to statistics. Physica, Heidelberg, pp 50–79

    Chapter  Google Scholar 

  • Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc Ser B (Methodological) 47:1–52

    MATH  Google Scholar 

  • Simonoff JS (1996) Smoothing methods in statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Staniswalis JG, Messer K, Finston DR (1993) Kernel estimators for multivariate regression. J Nonparametric Stat 3(2):103–121

    Article  MathSciNet  MATH  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Stat Methodol 36(2):111–147

    MathSciNet  MATH  Google Scholar 

  • Wand M, Jones M (1993) Comparison of smoothing parameterizations in bivariate kernel density-estimation. J Am Stat Assoc 88(422):520–528

    Article  MathSciNet  MATH  Google Scholar 

  • Wand M, Jones M (1995) Kernel smoothing. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Wand MP, Jones MC (1994) Multivariate plug-in bandwidth selection. Comput Stat 9(2):97–116

    MathSciNet  MATH  Google Scholar 

  • Yang L, Tschernig R (1999) Multivariate bandwidth selection for local linear regression. J R Stat Soc Ser B (Stat Methodol) 61(4):793–815

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang X, Brooks RD, King ML (2009) A bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation. J Econom 153(1):21–32

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by Masaryk University, Project GAČR GA15-06991S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Koláček.

Appendix: Proofs

Appendix: Proofs

As the first, we introduce some facts on matrix differential calculus and on the Gaussian density (see Magnus and Neudecker 1979, 1999; Aldershof et al. 1995).

Let \(\mathbf {A},\, \mathbf {B}\) be \(d\times d\) matrices:

\(1^\circ \) :

\(\displaystyle {{\mathrm{tr}}}(\mathbf {A}^T \mathbf {B})=\mathrm {vec}^T \mathbf {A}\mathrm {vec}\mathbf {B}\)

\(2^\circ \) :

\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=c\{{{\mathrm{tr}}}(\mathbf {H}D^2)\}m(\mathbf {x})\)

\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2 \mathbf {H}^{1/2}\mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=3c^2\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})\)

\(\displaystyle \int \phi _{c\mathbf {I}}(\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad c,\,k\in \mathbb {N}_0\)

\(3^\circ \) :

\(\displaystyle \int D^km(\mathbf {x})[D^km(\mathbf {x})]^Td\mathbf {x}= (-1)^k\int D^{2k}m(\mathbf {x})m(\mathbf {x})d\mathbf {x},\quad k\in \mathbb {N}\)

\(4^\circ \) :

\(\displaystyle \varLambda (\mathbf {z})=\phi _{4\mathbf {I}}(\mathbf {z})-2\phi _{3\mathbf {I}}(\mathbf {z})+\phi _{2\mathbf {I}}(\mathbf {z})\),

then using \(3^\circ \) yields

\(\displaystyle \int \varLambda (\mathbf {z})d\mathbf {z}= 0\)

\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}= \mathbf {0}\)

\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) \}m(\mathbf {x}) d\mathbf {z}=6\{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x})\)

\(\displaystyle \int \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^k(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) {{\mathrm{tr}}}(\mathbf {H}^{1/2}D\mathbf {z}^T)\}m(\mathbf {x})d\mathbf {z}=\mathbf {0},\quad k\in \mathbb {N}_0\)

\(5^\circ \) :

Taylor expansion takes the form

$$\begin{aligned} \begin{aligned} m(\mathbf {x}-\mathbf {H}^{1/2}\mathbf {z})&= m(\mathbf {x})-\{\mathbf {z}^T\mathbf {H}^{1/2}D\}m(\mathbf {x})\\&\quad + \frac{1}{2!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^2\}m(\mathbf {x})+\dots \\&\quad +\frac{(-1)^k}{k!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^k\}m(\mathbf {x})\\&\quad + o(||\mathbf {H}^{1/2}\mathbf {z}||^k). \end{aligned} \end{aligned}$$

Lemma 2

The integrated square bias can be expressed as

$$\begin{aligned} {{\mathrm{ISB}}}(\mathbf {H}) = \int \left( (K_{\mathbf {H}}*m)(\mathbf {x}) -m(\mathbf {x}) \right) ^2\,{{\mathrm{d}}}\mathbf {x}+ O(n^{-1}), \end{aligned}$$

the symbol \(*\) denotes convolution.

Proof

$$\begin{aligned} \widehat{m}(\mathbf {x},\mathbf {H}) =\sum _{i=1}^n \int \limits _{A_i} K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})\,{{\mathrm{d}}}\mathbf {z}Y_i . \end{aligned}$$

Each integral in the sum can be approximated in the following way

$$\begin{aligned} \int \limits _{A_i} K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})\,{{\mathrm{d}}}\mathbf {z}= \underbrace{\lambda (A_i)}_{\frac{1}{n}}K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i) + O\left( n^{-1}\right) \end{aligned}$$

Thus

$$\begin{aligned} \widehat{m}(\mathbf {x},\mathbf {H}) =\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)Y_i + O\left( n^{-1}\right) . \end{aligned}$$

Further

$$\begin{aligned} E\widehat{m}(\mathbf {x},\mathbf {H})&=\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)EY_i + O\left( n^{-1}\right) \\&=\frac{1}{n}\sum _{i=1}^n K_{\mathbf {H}}(\mathbf {x}-\mathbf {x}_i)m(\mathbf {x}_i) + O\left( n^{-1}\right) \\&=\int K_{\mathbf {H}}(\mathbf {x}-\mathbf {z})m(\mathbf {z})d\mathbf {z}+ O\left( n^{-1}\right) \\&= (K_{\mathbf {H}}*m)(\mathbf {x})+ O\left( n^{-1}\right) . \end{aligned}$$

Sketch of the proof of Theorem 1:

Proof

In order to show the \(\widehat{\varGamma }\) is the asymptotically unbiased estimator of \(\varGamma \), we evaluate \(E(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))\)

$$\begin{aligned} \begin{aligned} E(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))&=\frac{1}{n^2}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda _\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j)m(\mathbf {x}_i)m(\mathbf {x}_j)\\&=\iint \varLambda _\mathbf {H}(\mathbf {x}-\mathbf {y})m(\mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \varLambda (\mathbf {z})m(\mathbf {x}-\mathbf {H}^{1/2}\mathbf {z})m(\mathbf {x})d\mathbf {z}d\mathbf {x}+ O\left( n^{-1}\right) .\\ \end{aligned} \end{aligned}$$

Taylor expansion, defined in \(5^{\circ }\), and using \(4^\circ \) yields

$$\begin{aligned} \begin{aligned}&= \iint \varLambda (\mathbf {z})\Biggl (\sum _{i=0}^5\frac{(-1)^i}{i!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^i\}m(\mathbf {x}) \\&\phantom {==}+ o(||\mathbf {H}^{1/2}\mathbf {z}||^5)\Biggr )m(\mathbf {x})d\mathbf {z}d\mathbf {x}+ O\left( n^{-1}\right) \\&= \iint \varLambda (\mathbf {z})\biggl (\frac{1}{4!}\{(\mathbf {z}^T\mathbf {H}^{1/2}D)^4\}m(\mathbf {x}) +o(||\mathbf {H}^{1/2}\mathbf {z}||^5)\biggr )m(\mathbf {x})d\mathbf {z}d\mathbf {x}\\&\qquad + O\left( n^{-1}\right) \\&=\frac{1}{4!}\iint \varLambda (\mathbf {z})\{{{\mathrm{tr}}}^2(\mathbf {H}^{1/2} D^2\mathbf {H}^{1/2} \mathbf {z}\mathbf {z}^T) \}m(\mathbf {x})m(\mathbf {x})d\mathbf {z}d\mathbf {x}\\&\qquad +o(||\mathrm {vec}\mathbf {H}||^{5/2})+ O\left( n^{-1}\right) , \\ \end{aligned} \end{aligned}$$

using properties \(3^\circ \) and \(4^\circ \) we arrive at

$$\begin{aligned} \begin{aligned}&=\frac{1}{4} \int \{{{\mathrm{tr}}}^2(\mathbf {H}D^2) \}m(\mathbf {x}) m(\mathbf {x})d\mathbf {x}+o(||\mathrm {vec}\mathbf {H}||^{5/2}) + O\left( n^{-1}\right) \\&=\frac{1}{4}\int ||\{{{\mathrm{tr}}}(\mathbf {H}D^{2})\}m(\mathbf {x})||^2d\mathbf {x}+o(||\mathrm {vec}\mathbf {H}||^{5/2}) + O\left( n^{-1}\right) \\&=\frac{1}{4}\mathrm {vec}^T\mathbf {H}V(\{\mathrm {vec}D^{2}\}m)\mathrm {vec}\mathbf {H}+o(||\mathrm {vec}\mathbf {H}||^{5/2})+ O\left( n^{-1}\right) . \end{aligned} \end{aligned}$$

To finish the proof of Theorem 1 it is sufficient to derive \({{\mathrm{Var}}}(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))\)

$$\begin{aligned} \begin{aligned} {{\mathrm{Var}}}(\widehat{{{\mathrm{AISB}}}}(\mathbf {H}))&=\frac{1}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda ^2_\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j){{\mathrm{Var}}}Y_iY_j\\&=\frac{\sigma ^4}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \varLambda ^2_\mathbf {H}(\mathbf {x}_i-\mathbf {x}_j). \end{aligned} \end{aligned}$$

Since the estimator \(\widehat{{{\mathrm{AISB}}}}(\mathbf {H})\) is asymptotically unbiased and its variance tends to zero the estimator is consistent.

Sketch of the proof of Lemma 1:

Proof

In order to show the \(\widehat{\psi }_{4,0}\) is the asymptotically unbiased estimator of \(\psi _{4,0}\), we evaluate \(E(\widehat{\psi }_{4,0})\)

$$\begin{aligned} \begin{aligned} E(\widehat{\psi }_{4,0})&=\frac{1}{n^2}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j)m(\mathbf {x}_i)m(\mathbf {x}_j)\\&=\iint \frac{\partial ^4 K_\mathbf {G}}{\partial y_1^4}(\mathbf {x}-\mathbf {y})m(\mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}K_\mathbf {G}(\mathbf {x}- \mathbf {y})m(\mathbf {x})d\mathbf {y}d\mathbf {x}+ O\left( n^{-1}\right) \\&=\iint \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}K(\mathbf {z})m\left( \mathbf {y}+ \mathbf {G}^{1/2}\mathbf {z}\right) d\mathbf {y}d\mathbf {z}+ O\left( n^{-1}\right) \\ \end{aligned} \end{aligned}$$

Taylor expansion defined in \(5^{\circ }\) yields

$$\begin{aligned}&= \iint K(\mathbf {z})\Biggl (\sum _{i=0}^2\frac{(-1)^i}{i!}\{(\mathbf {z}^T\mathbf {G}^{1/2}D)^i\}m(\mathbf {y}) \\&\phantom {==}+ o(||\mathbf {G}^{1/2}\mathbf {z}||^2)\Biggr )\frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4}d\mathbf {y}d\mathbf {z}+ O\left( n^{-1}\right) \\&= \int \frac{\partial ^4 m(\mathbf {y})}{\partial y_1^4} m(\mathbf {y})d\mathbf {y}+ \frac{\beta _2}{2}\int \{{{\mathrm{tr}}}(GD^2)\} m(\mathbf {x}) d\mathbf {x}+ O\left( n^{-1}\right) \\&=\,\psi _{4,0} + \frac{\beta _2}{2}\int \{{{\mathrm{tr}}}(GD^2)\} m(\mathbf {x}) d\mathbf {x}+ O\left( n^{-1}\right) . \end{aligned}$$

To finish the proof of Lemma 1 it is sufficient to derive \({{\mathrm{Var}}}(\widehat{\psi }_{4,0})\)

$$\begin{aligned} \begin{aligned} {{\mathrm{Var}}}(\widehat{\psi }_{4,0})&=\frac{1}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j){{\mathrm{Var}}}Y_iY_j\\&=\frac{\sigma ^4}{n^4}\sum _{\begin{array}{c} i,j=1 \\ i\ne j \end{array}}^n \frac{\partial ^4 K_\mathbf {G}}{\partial x_1^4}(\mathbf {x}_i-\mathbf {x}_j). \end{aligned} \end{aligned}$$

Since the estimator \(\widehat{\psi }_{4,0}\) is asymptotically unbiased and its variance tends to zero, the estimator is consistent. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koláček, J., Horová, I. Bandwidth matrix selectors for kernel regression. Comput Stat 32, 1027–1046 (2017). https://doi.org/10.1007/s00180-017-0709-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0709-3

Keywords

Navigation