A plug-in bandwidth selector for nonparametric quantile regression

Abstract

In the framework of quantile regression, local linear smoothing techniques have been studied by several authors, particularly by Yu and Jones (J Am Stat Assoc 93:228–237, 1998). The problem of bandwidth selection was addressed in the literature by the usual approaches, such as cross-validation or plug-in methods. Most of the plug-in methods rely on restrictive assumptions on the quantile regression model in relation to the mean regression, or on parametric assumptions. Here we present a plug-in bandwidth selector for nonparametric quantile regression that is defined from a completely nonparametric approach. To this end, the curvature of the quantile regression function and the integrated squared sparsity (inverse of the conditional density) are both nonparametrically estimated. The new bandwidth selector is shown to work well in different simulated scenarios, particularly when the conditions commonly assumed in the literature are not satisfied. A real data application is also given.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Abberger K (1998) Cross-validation in nonparametric quantile regression. Allg Statistisches Archiv 82:149–161

    Google Scholar 

  2. Abberger K (2002) Variable data driven bandwidth choice in nonparametric quantile regression. Technical Report

  3. Bloch DA, Gastwirth JL (1968) On a simple estimate of the reciprocal of the density function. Ann Math Stat 39:1083–1085

    MathSciNet  Article  MATH  Google Scholar 

  4. Bofinger E (1975) Estimation of a density function using order statistics. Aust J Stat 17:1–7

    MathSciNet  Article  MATH  Google Scholar 

  5. Conde-Amboage M (2017) Statistical inference in quantile regression models, Ph.D. Thesis. Universidade de Santiago de Compostela, Spain

  6. El Ghouch A, Genton MG (2012) Local polynomial quantile regression with parametric features. J Am Stat Assoc 104:1416–1429

    MathSciNet  Article  MATH  Google Scholar 

  7. Fan J, Hu TC, Truong YK (1994) Robust nonparametric function estimation. Scand J Stat 21:433–446

    MATH  Google Scholar 

  8. Hall P, Sheather SJ (1988) On the distribution of a studentized quantile. J R Stat Soc Ser B (Methodol) 50:381–391

    MathSciNet  MATH  Google Scholar 

  9. Jones MC (1991) The roles of ISE and MISE in density estimation. Stat Probab Lett 12:51–56

    MathSciNet  Article  Google Scholar 

  10. Jones MC, Yu K (2007) Improved double kernel local linear quantile regression. Stat Model 7:377–389

    MathSciNet  Article  Google Scholar 

  11. Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

    Google Scholar 

  12. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    MathSciNet  Article  MATH  Google Scholar 

  13. Mallows CL (1973) Some comments on \(C_p\). Technometrics 15:661–675

    MATH  Google Scholar 

  14. Nelder JA, Mead R (1965) A simplex algorithm for function minimization. Comput J 7:308–313

    MathSciNet  Article  MATH  Google Scholar 

  15. Opsomer JD, Ruppert D (1998) A fully automated bandwidth selection method for fitting additive models. J Am Stat Assoc 93:605–618

    MathSciNet  Article  MATH  Google Scholar 

  16. Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90:1257–1270

    MathSciNet  Article  MATH  Google Scholar 

  17. Sánchez-Sellero C, González-Manteiga W, Cao R (1999) Bandwidth selection in density estimation with truncated and censored data. Ann Inst Stat Math 51:51–70

    MathSciNet  Article  MATH  Google Scholar 

  18. Sheather SJ, Maritz JS (1983) An estimate of the asymptotic standard error of the sample median. Aust J Stat 25:109–122

    MathSciNet  Article  MATH  Google Scholar 

  19. Siddiqui MM (1960) Distribution of quantiles in samples from a bivariate population. J Res Natl Bureau Stand 64:145–150

    MathSciNet  Article  MATH  Google Scholar 

  20. Tukey JW (1965) Which part of the sample contains the information. Proc Natl Acad Sci 53:127–134

    MathSciNet  Article  MATH  Google Scholar 

  21. Venables WN, Ripley BD (1999) Modern applied statistics with S-PLUS, 3rd edn. Springer, New York

    Google Scholar 

  22. Yu K, Jones MC (1998) Local linear quantile regression. J Am Stat Assoc 93:228–237

    MathSciNet  Article  MATH  Google Scholar 

  23. Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31:333–346

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the support of Projects MTM2013–41383–P (Spanish Ministry of Economy, Industry and Competitiveness) and MTM2016–76969–P (Spanish State Research Agency, AEI), both co-funded by the European Regional Development Fund (ERDF). Support from the IAP network StUDyS, from Belgian Science Policy, is also acknowledged. Work of M. Conde-Amboage has been supported by FPU grant AP2012-5047 from the Spanish Ministry of Education. We are grateful to two anonymous referees for their constructive comments, which helped to improve the paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to César Sánchez-Sellero.

Appendix: Mean squared error of curvature and sparsity estimators

Appendix: Mean squared error of curvature and sparsity estimators

Here expressions (4) and (5) are derived. They give approximations to the mean squared error of curvature and sparsity estimators, respectively. A complete development of these expressions can be seen in Chapter 3 of Conde-Amboage (2017).

Derivation of (4)

In order to derive the asymptotic mean integrated squared error of the curvature estimator, the following assumptions will be needed:

C1:

The density function of the explanatory variable X, denoted by g, is differentiable, and its first derivative is a bounded function.

C2:

The kernel function K is symmetric and nonnegative, has a bounded support and verifies that \(\int K(u) \; \hbox {d}u =1\), \(\mu _6(K)=\int u^6 K(u) \; \hbox {d}u < \infty \) and \(\int K^2(u) \; \hbox {d}u < \infty \). Moreover, it is assumed that the bandwidth parameter \(h_\mathrm{c}\) verifies that \(h_\mathrm{c} \rightarrow 0\) and \(nh_\mathrm{c}^{5} \rightarrow \infty \) when \(n \rightarrow \infty \).

C3:

The conditional distribution function \(F(y|X=x)\) of the response variable is three times derivable in x for each y and its first derivative verifies that \(F^{(1)}(q_{\tau }(x)|X=x)=f(q_{\tau }(x)|X=x)\ne 0\). Moreover, there exist positive constants \(c_1\) and \(c_2\) and a positive function \(\text{ Bound }(y|X=x)\) such that

$$\begin{aligned} \sup _{|x_n-x|<c_1} f(y|X=x_n) \le \text{ Bound }(y|x) \end{aligned}$$

and

$$\begin{aligned}&\int |\psi _{\tau }(y-q_{\tau }(x))|^{2+\delta } \; \text{ Bound }(y|X=x) \; \hbox {d}y<\infty \\&\int (\rho _{\tau }(y-t)-\rho _{\tau }(y)-\psi _{\tau }(y)t)^{2} \; \hbox { Bound}(y|X=x) \; \hbox {d}y=o(t^2), \quad \text{ as }\ t \rightarrow 0 \end{aligned}$$

where \(\psi _{\tau }(r)=\tau {\mathbb {I}}(r>0)+(\tau -1){\mathbb {I}}(r<0)\).

C4:

The function \(q_{\tau _1}(x)\) has a continuous fourth derivative with respect to x for any \(\tau _1\) in a neighbourhood of \(\tau \). These derivatives will be denoted by \(q_{\tau }^{(i)}\) with \(i \in \{1,2,3,4\}\). Moreover, all these derivatives are bounded functions in a neighbourhood of \(\tau \).

Applying the arguments of the proof of Theorem 3 in Fan et al. (1994) to a local polynomial of order 3, the estimator of the second derivative can be approximated by

$$\begin{aligned} {\widetilde{q}}_{\tau ,h_\mathrm{c}}^{(2)}(x)\cong q_\tau ^{(2)}(x)+2h_\mathrm{c}^{-2}\frac{1}{f(q_\tau (x)|x)g(x)}V_{n,\tau }(x) \end{aligned}$$

where

$$\begin{aligned} V_{n,\tau }(x)=\frac{1}{nh_\mathrm{c}}\sum _{i=1}^n \psi _\tau \left( Y_i^{(3)}\right) \left( \alpha _{31}+\alpha _{33}\left( \frac{X_i-x}{h_\mathrm{c}}\right) ^2\right) K\left( \frac{x-X_i}{h_\mathrm{c}}\right) , \end{aligned}$$

where \(Y_i^{(3)}=Y_i-q_\tau (x)-q_\tau ^{(1)}(x)(X_i-x)-(1/2)q_\tau ^{(2)}(x)(X_i-x)^2-(1/6)q_\tau ^{(3)}(x)(X_i-x)^3\) and \(\psi _\tau (z)=\tau -{\mathbb {I}}(z<0)\). Note that assumptions C1-C4 were used here.

Now, expectation and variance of \({\widetilde{q}}_{\tau ,h_\mathrm{c}}^{(2)}(x)\) can be obtained by some algebraic calculations:

$$\begin{aligned}&{\mathbb {E}}\left( {\widetilde{q}}_{\tau ,h_\mathrm{c}}^{(2)}(x)\right) \cong q_\tau ^{(2)}(x)+\frac{1}{2}\delta _1 q_\tau ^{(4)}(x)h_\mathrm{c}^2 \\&{\mathbb {V}}\text{ ar }\left( {\widetilde{q}}_{\tau ,h_\mathrm{c}}^{(2)}(x)\right) \cong \delta _2\frac{1}{nh_\mathrm{c}^5}\frac{\tau (1-\tau )}{f(q_\tau (x)|x)^2g(x)} \end{aligned}$$

where \(\delta _1\) and \(\delta _2\) were defined in expression (4). Recall that the curvature estimator is given by

$$\begin{aligned} {\widehat{\vartheta }}_{h_\mathrm{c}}=\frac{1}{n} \sum _{i=1}^{n}{\widetilde{q}}^{(2)}_{\tau ,h_\mathrm{c}}(X_{i})^2. \end{aligned}$$

Then, combining expectation and variance of \({\widetilde{q}}_{\tau ,h_\mathrm{c}}^{(2)}(X_i)\) conditionally to \(X_i\), and taking expectation with respect to \(X_i\), we obtain

$$\begin{aligned} {\mathbb {E}}\left( {\widehat{\vartheta }}_{h_\mathrm{c}}\right)&\cong \vartheta +\delta _1 \; h_\mathrm{c}^2 \; \int {q_{\tau }^{(2)}(x)q_{\tau }^{(4)}(x)g(x)}\,\hbox {d}x \\&\quad +\delta _2 \; \tau (1-\tau ) \; \frac{1}{nh_\mathrm{c}^5} \; \int {\frac{1}{f(q_{\tau }(x)|x)^2}\,\hbox {d}x} \end{aligned}$$

Additional calculations, which can be found in Conde-Amboage (2017), show that the dominant terms in the variance of \({\widehat{\vartheta }}_{h_\mathrm{c}}\) are of orders \(n^{-1}\) and \(n^{-2} \, h_\mathrm{c}^{-9}\). The term of order \(n^{-1}\) does not depend on \(h_\mathrm{c}\), while the term of order \(n^{-2}\,h_\mathrm{c}^{-9}\) is negligible with respect to the asymptotic squared bias. Because of this, the asymptotically optimal bandwidth can be obtained by minimizing the asymptotic squared bias. This fact, together with last expression for \({\mathbb {E}}\left( {\widehat{\vartheta }}_{h_\mathrm{c}}\right) \), leads to expression (4).

Derivation of (5)

The following conditions will be assumed in order to derive the asymptotic mean integrated squared error of the sparsity estimator:

S1:

The conditional density function \(f(y|X=x)\) of the response variable is twice derivable in x for each y and \(f^{(i)}(q_{\tau }(x)|X=x)\ne 0\) with \(i=0,1,2\). Moreover, there exist positive constants \(c_1\) and \(c_2\) and a positive function \(\text{ Bound }(y|X=x)\) such that

$$\begin{aligned} \sup _{|x_n-x|<c_1} f(y|X=x_n) \le \text{ Bound }(y|X=x) \end{aligned}$$

and

$$\begin{aligned}&\int |\psi _{\tau }(y-q_{\tau }(x))|^{2+\delta } \; \text{ Bound }(y|X=x) \; \hbox {d}y<\infty \\&\\&\int (\rho _{\tau }(y-t)-\rho _{\tau }(y)-\psi _{\tau }(y)t)^{2} \; \hbox { Bound}(y|X=x) \; \hbox {d}y=o(t^2), \quad \text{ as }\ t \rightarrow 0 \end{aligned}$$

where \(\psi _{\tau }(r)=\tau {\mathbb {I}}(r>0)+(\tau -1){\mathbb {I}}(r<0)\).

S2:

The function \(q_{\tau _1}\) has a continuous second derivative for any \(\tau _1\) in a neighbourhood of \(\tau \) as a function of x. These derivatives will be denoted by \(q_{\tau }^{(i)}\). Moreover, all these functions are bounded functions in a neighbourhood of \(\tau \).

S3:

The density function of the explanatory variable X, denoted by g, is differentiable, and this first derivative is a bounded function.

S4:

The kernel K is symmetric and nonnegative, has a bounded support and verifies that \(\int K(u) \; \hbox {d}u< \infty \), \(\int K(u)^2 \; \hbox {d}u < \infty \) and \(\mu _2(K)<\infty \). Moreover, the bandwidth parameters verify that \(d_\mathrm{s} \rightarrow 0\), \(h_\mathrm{s} \rightarrow 0\) and \(nd_\mathrm{s}h_s \rightarrow \infty \) when \(n \rightarrow \infty \).

S5:

The function \(q_{\tau _1}\) has a continuous and bounded forth derivative with respect to \(\tau _1\) for any \(\tau _1\) in a neighbourhood of \(\tau \). Moreover, \(q_{\tau _1}^{(2)}\) has a continuous and bounded second derivative with respect to \(\tau _1\) for any \(\tau _1\) in a neighbourhood of \(\tau \).

Recall the definition of the proposed sparsity estimator

$$\begin{aligned} {\widehat{s}}_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)=\frac{{\widehat{q}}_{\tau +d_\mathrm{s},h_\mathrm{s}}(x)-{\widehat{q}}_{\tau -d_\mathrm{s},h_\mathrm{s}}(x)}{2\,d_\mathrm{s}} \end{aligned}$$

where \({\widehat{q}}_{\tau +d_\mathrm{s},h_\mathrm{s}}\) and \({\widehat{q}}_{\tau -d_\mathrm{s},h_\mathrm{s}}\) are local linear quantile regression estimates at the quantile orders \((\tau +d_\mathrm{s})\) and \((\tau -d_\mathrm{s})\), respectively, and \(h_\mathrm{s}\) denotes their bandwidth. Applying Fan et al. (1994)’s results, we have

$$\begin{aligned} {\widehat{q}}_{\tau +d_\mathrm{s},h_\mathrm{s}}(x)\cong q_{\tau +d_\mathrm{s}}(x)+\frac{1}{f(q_{\tau +d_\mathrm{s}}(x)|x)g(x)}U_{\tau +d_\mathrm{s},h_\mathrm{s}}(x) \end{aligned}$$

where

$$\begin{aligned} U_{\tau +d_\mathrm{s},h_\mathrm{s}}(x)=\frac{1}{nh_\mathrm{s}}\sum _{i=1}^n \psi _{\tau +d_\mathrm{s}}\left( Y_i^{(1)}\right) K\left( \frac{x-X_i}{h_\mathrm{s}}\right) , \end{aligned}$$
(10)

and \(Y_i^{(1)}=Y_i-q_{\tau +d_\mathrm{s}}(x)-q_{\tau +d_\mathrm{s}}^{(1)}(x)(X_i-x)\). Analogously for \({\widehat{q}}_{\tau -d_\mathrm{s},h_\mathrm{s}}(x)\).

Substituting these expressions in the definition of \({\widehat{s}}_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)\), we have

$$\begin{aligned} {\widehat{s}}_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)=A(x)+B(x) \end{aligned}$$
(11)

with

$$\begin{aligned} A(x){=}\frac{q_{\tau +d_\mathrm{s}}(x)-q_{\tau -d_\mathrm{s}}(x)}{2\,d_\mathrm{s}}, \quad B(x)=\frac{1}{g(x)}\left( \frac{U_{\tau +d_\mathrm{s},h_\mathrm{s}}(x)}{f(q_{\tau +d_\mathrm{s}}(x)|x)}-\frac{U_{\tau -d_\mathrm{s},h_\mathrm{s}}(x)}{f(q_{\tau -d_\mathrm{s}}(x)|x)}\right) . \end{aligned}$$

Note that A(x) is not random and can be approximated by a Taylor expansion as

$$\begin{aligned} A(x)\cong s_\tau (x)+\frac{1}{6}s_\tau ^{(2,\tau )}(x)d_\mathrm{s}^2 \end{aligned}$$

if S2 follows. Moreover, based on arguments developed in Lemma 2 of Fan et al. (1994), the expectation and variance of B(x) can be approximated by

$$\begin{aligned}&{\mathbb {E}}(B(x))\cong \frac{1}{2} \mu _2(K) \frac{\partial q_{\tau }^{(2)}(x)}{\partial \tau }h_\mathrm{s}^2 \\&{\mathbb {V}}\text{ ar }(B(x))\cong \frac{1}{2nd_\mathrm{s}h_s}\frac{\int K^2(u)\,\hbox {d}u}{f(q_{\tau }(x)|x)g(x)} \end{aligned}$$

if assumptions S1-S5 follow.

From these results, the asymptotic bias of the estimated squared sparsity is given by

$$\begin{aligned} \text{ Bias } \left( \int {{\widehat{s}}^2_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)\,\hbox {d}x}\right)&\cong \left[ \frac{1}{nd_\mathrm{s} h_\mathrm{s}}\int {a(x)\,\hbox {d}x}+d_\mathrm{s}^2\int {b(x)\,\hbox {d}x}+h_\mathrm{s}^2\int {c(x)\,\hbox {d}x}\right] ^2 \end{aligned}$$

where a(x), b(x) and c(x) are given in (6).

In view of expression (11), the asymptotic variance of sparsity estimator can be decomposed as follows

$$\begin{aligned} {\mathbb {V}}\text{ ar } \left[ \int {\widehat{s}}_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)^2 \; \hbox {d}x \right]&\cong {\mathbb {V}}\text{ ar } \left[ \int \left( A(x)^2+B(x)^2+2A(x)B(x) \right) \; \hbox {d}x \right] \nonumber \\&={\mathbb {V}}\text{ ar } \left[ \int B(x)^2\; \hbox {d}x \right] +4 \; {\mathbb {V}}\text{ ar } \left[ \int A(x)B(x) \; \hbox {d}x \right] \nonumber \\&\quad +4 \; {\mathbb {C}}\text{ ov } \left[ \int B(x)^2 \; \hbox {d}x ,\int A(x)B(x)\; \hbox {d}x \right] . \end{aligned}$$

Each of the previous terms can be expressed as covariances of U-expressions like that given in (10) evaluated at different points x and quantiles \(\tau +d_\mathrm{s}\) and \(\tau -d_\mathrm{s}\). These covariances can be computed (under assumptions S1, S2, S4 and S5) using similar arguments to those employed by Fan et al. (1994), adapting their \(\varphi \) function (given in equation (2.1) on p. 435) to each covariance. Then, the asymptotic variance of sparsity estimator can be approximated as follows

$$\begin{aligned} {\mathbb {V}}\text{ ar } \left[ \int {\widehat{s}}_{\tau ,d_\mathrm{s},h_\mathrm{s}}(x)^2 \; \hbox {d}x \right]&\cong \frac{1}{nd_\mathrm{s}} \int d(x) \; \hbox {d}x +\frac{1}{n^2 d_\mathrm{s}^2 h_\mathrm{s}} \; \int e(x) \; \hbox {d}x \end{aligned}$$

where d(x) and e(x) are given in (6). Then, in view of the computed asymptotic bias and variance, expression (5) can be derived.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Conde-Amboage, M., Sánchez-Sellero, C. A plug-in bandwidth selector for nonparametric quantile regression. TEST 28, 423–450 (2019). https://doi.org/10.1007/s11749-018-0582-6

Download citation

Keywords

  • Quantile regression
  • Bandwidth
  • Nonparametric regression

Mathematics Subject Classification

  • 62G08