Skip to main content
Log in

Adaptive varying-coefficient linear quantile model: a profiled estimating equations approach

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We consider an estimating equations approach to parameter estimation in adaptive varying-coefficient linear quantile model. We propose estimating equations for the index vector of the model in which the unknown nonparametric functions are estimated by minimizing the check loss function, resulting in a profiled approach. The estimating equations have a bias-corrected form that makes undersmoothing of the nonparametric part unnecessary. The estimating equations approach makes it possible to obtain the estimates using a simple fixed-point algorithm. We establish asymptotic properties of the estimator using empirical process theory, with additional complication due to the nuisance nonparametric part. The finite sample performance of the new model is illustrated using simulation studies and a forest fire dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Belloni, A., Chernozhukov, V. (2011). l1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.

  • Bondell, H. D., Reich, B. J., Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika, 97, 825–838.

  • Cai, Z., Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425.

  • Carroll, R. J., Fan, J., Gijbels, I., Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477–489.

  • Chen, R., Tsay, R. S. (1993). Functional-coefficient autoregressive models. Journal of the American Statistical Association, 88, 298–308.

  • Cortez, P., Morais, A. (2007). A data mining approach to predict forest fires using meteorological data. In J. Neves, M. F. Santos and J. Machado (Eds.), New trends in artificial intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, 512–523.

  • Cui, X., Haerdle, W. K., Zhu, L. (2011). The efm approach for single-index models. The Annals of Statistics, 39, 1658–1688.

  • Fan, J., Fan, Y., Barut, E. (2014a). Adaptive robust variable selection. Annals of Statistics, 42, 324–351.

  • Fan, J., Ma, Y., Dai, W. (2014b). Nonparametric independence screening in sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284.

  • Fan, J., Yao, Q., Cai, Z. (2003). Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 57–80.

  • Fan, J. Q., Zhang, J. T. (2000). Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society Series B-Statistical Methodology, 62, 303–322.

  • Fan, J. Q., Zhang, W. Y. (1999). Statistical estimation in varying coefficient models. Annals of Statistics, 27, 1491–1518.

  • Hall, P., Sheather, S. J. (1988). On the distribution of a studentized quantile. Journal of the Royal Statistical Society: Series B (Methodological), 50, 381–391.

  • Hastie, T., Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society Series B-Methodological, 55, 757–796.

  • He, X., Shi, P. (1994). Convergence rate of b-spline estimators of nonparametric conditional quantile functions. Journal of Nonparametric Statistics, 3, 299–308.

  • Hendricks, W., Koenker, R. (1992). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association, 87, 58–68.

  • Hoover, D. R., Rice, J. A., Wu, C. O., Yang, L. P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika, 85, 809–822.

  • Horowitz, J. L., Lee, S. (2005). Nonparametric estimation of an additive quantile regression model. Journal of the American Statistical Association, 100, 1238–1249.

  • Hu, Y., Gramacy, R., Lian, H. (2013). Bayesian quantile regression for single-index models. Statistics and Computing, 23, 437–454.

  • Huang, J. H. Z., Wu, C. O., Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89, 111–128.

  • Jiang, L., Wang, H. J., Bondell, H. D. (2013). Interquantile shrinkage in regression models. Journal of Computational and Graphical Statistics, 22, 970–986.

  • Kim, M. (2007). Quantile regression with varying coefficients. Annals of Statistics, 35, 92–108.

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker, R., Bassett, G, Jr. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 1, 33–50.

  • Koenker, R., Ng, P., Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81, 673–680.

  • Kong, E., Xia, Y. (2012). A single-index quantile regression model and its estimation. Econometric Theory, 28, 730–768.

  • Kottas, A., Krnjajic, M. (2009). Bayesian semiparametric modelling in quantile regression. Scandinavian Journal of Statistics, 36, 297–319.

  • Lai, P., Wang, Q., Lian, H. (2012). Bias-corrected gee estimation and smooth-threshold gee variable selection for single-index models with clustered data. Journal of Multivariate Analysis, 105, 422–432.

  • Lee, S. (2003). Efficient semiparametric estimation of a partially linear quantile regression model. Econometric Theory, 19, 1–31.

    Article  MathSciNet  MATH  Google Scholar 

  • Lian, H. (2012a). Semiparametric estimation of additive quantile regression models by two-fold penalty. Journal of Business & Economic Statistics, 30, 337–350.

  • Lian, H. (2012b). Variable selection for high-dimensional generalized varying-coefficient models. Statistica Sinica, 22, 1563–1588.

    MathSciNet  MATH  Google Scholar 

  • Liu, J., Li, R., Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109, 266–274.

  • Lu, Z., Tjstheim, D., Yao, Q. (2007). Adaptive varying-coefficient linear models for stochastic processes: asymptotic theory. Statistica Sinica, 17, 177–198.

  • Reich, B., Bondell, H., Wang, H. (2010). Flexible bayesian quantile regression for independent and clustered data. Biostatistics, 11, 337–352.

  • Sherwood, B., Wang, L. (2016). Partially linear additive quantile regression in ultra-high dimension. Annals of Statistics, 44, 288–317.

  • Tang, Y., Wang, H. J., Zhu, Z. (2013). Variable selection in quantile varying coefficient models with longitudinal data. Computational Statistics & Data Analysis, 57, 435–449.

  • Tokdar, S., Kadane, J. B. (2011). Simultaneous linear quantile regression: A semiparametric bayesian approach. Bayesian Analysis, 6, 1–22.

  • van der Geer, S. A. (2000). Empirical processes in M-estimation. Cambridge: Cambridge University Press.

    Google Scholar 

  • van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.

  • Wang, H. J., Zhu, Z., Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 37, 3841–3866.

  • Wang, J.-L., Xue, L., Zhu, L., Chong, Y. S. (2010). Estimation for a partial-linear single-index model. Annals of Statistics, 38, 246–274.

  • Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222.

  • Wei, F., Huang, J., Li, H. Z. (2011). Variable selection and estimation in high-dimensional varying-coefficient models. Statistica Sinica, 21, 1515–1540.

  • Wu, T. Z., Yu, K., Yu, Y. (2010). Single-index quantile regression. Journal of Multivariate Analysis, 101, 1607–1621.

  • Wu, Y., Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica, 19, 801–817.

  • Xia, Y., Li, W. (1999). On single-index coefficient regression models. Journal of the American Statistical Association, 94, 1275–1285.

  • Xue, L., Qu, A. (2012). Variable selection in high-dimensional varying-coefficient models with global optimality. The Journal of Machine Learning Research, 13, 1973–1998.

  • Yang, Y., He, X. (2012). Bayesian empirical likelihood for quantile regression. The Annals of Statistics, 40, 1102–1131.

  • Yu, K., Jones, M. (1998). Local linear quantile regression. Journal of the American Statistical Association, 93, 228–237.

  • Yu, K., Moyeed, R. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54, 437–447.

  • Yu, Y., Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97, 1042–1054.

  • Zhu, L., Huang, M., Li, R. (2012). Semiparametric quantile regression with high-dimensional covariates. Statistica Sinica, 22, 1379–1401.

  • Zhu, L., Lin, L., Cui, X., Li, G. (2010). Bias-corrected empirical likelihood in a multi-link semiparametric model. Journal of Multivariate Analysis, 101, 850–868.

Download references

Acknowledgements

We sincerely thank the AE and two anonymous reviewers for their insightful comments which have led to significant improvement of the paper. The research of Zhao was supported in part by National Social Science Fund of China (15BTJ027), and the research of Lian was supported by a start up Grant (No. 7200521/MA) from the City University of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weihua Zhao.

Appendix: Technical proofs

Appendix: Technical proofs

In our proofs, C denotes a generic positive constant which can take different values even on the same line.

Proof of Theorem 1

Consider the class of functions

$$\begin{aligned} \mathcal {F}= & {} \left\{ \left( \mathbf{x}\mathbf{z}^{\mathrm{T}}\mathbf{g}'(\mathbf{x}^{\mathrm{T}}\varvec{\beta })-\mathbf{H}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\right) \left( \tau -I\left\{ y-\mathbf{g}^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\le 0\right\} \right) , \varvec{\beta }\in \mathcal {B}, \right. \\&\hbox { and for some } \alpha>3/2, \alpha '>1/2, \hbox { and } M>0, \\&\left. \hbox {entries of } \mathbf{g}\hbox { are in } \mathcal {C}^{\alpha }(M), \hbox { and entries of } \mathbf{H}\hbox { are in } \mathcal {C}^{\alpha '}(M)\right\} . \end{aligned}$$

For \(\alpha >1/2\), define

$$\begin{aligned} \mathcal {F}_1=\left\{ h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }):\varvec{\beta }\in \mathcal {B},h\in \mathcal {C}^{\alpha }(M)\right\} . \end{aligned}$$

For any fixed \(\varvec{\beta }\), the class \(\mathcal {F}_1(\varvec{\beta }):=\{h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }): h\in \mathcal {C}^\alpha (R) \}\) has entropy \(\log N(\delta ,\mathcal {C}^{\alpha }(M),\Vert .\Vert _\infty )\le C\delta ^{-1/\alpha }\) by Theorem 2.7.1 of van der Vaart and Wellner (1996). Since \(\Vert h(\mathbf{x}^{\mathrm{T}}\varvec{\beta })-h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }')\Vert _\infty \le C\Vert \varvec{\beta }-\varvec{\beta }\Vert ^{s}\) for some \(s>1/2\), it is easy to see that the \(\delta \)-entropy of \(\mathcal {F}_1\) in \(L_\infty \) norm is bounded by the sum of \(C\delta ^{-1/\alpha }\) and the \(\delta ^{1/s}\)-entropy of \(\mathcal {B}\) in Euclidean norm, with the latter being \(C\log (1/\delta )\). Thus by Theorem 19.14 of van der Vaart (1998) and that \(\mathbf{x}\) lies in a compact set, \(\mathcal {F}_1\) is a Donsker class.

Furthermore, consider

$$\begin{aligned} \mathcal {F}_2=\left\{ I\{y-\mathbf{g}^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\le 0\}: \varvec{\beta }\in \mathcal {B},\mathbf{g}\in \mathcal {C}^{\alpha }(M)\right\} . \end{aligned}$$

We have

$$\begin{aligned}&E\left( I\left\{ y-\mathbf{g}_1^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)\mathbf{z}\le 0\right\} -I\left\{ y-\mathbf{g}_2^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\mathbf{z}\le 0\right\} \right) ^2\\&\qquad \le C E|\mathbf{g}_1^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)\mathbf{z}-\mathbf{g}_2^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\mathbf{z}|\\&\qquad \le C \left( E\left[ \mathbf{g}_1(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)-\mathbf{g}_2(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\right] ^2\right) ^{1/2}. \end{aligned}$$

Thus \(\log N(\delta ,\mathcal {F}_2,L_2)\le C\log N(C\delta ^2,\mathcal {F}_1,L_2)\le C\delta ^{-2/\alpha }\) and \(\mathcal {F}_2\) is Donsker if \(\alpha >1\). Combining that \(\mathcal {F}_1\) and \(\mathcal {F}_2\) are Donsker classes, it is easy to see that \(\mathcal {F}\) is also a Donsker class.

First we prove consistency. Let \(F(.|\mathbf{X})\) be the conditional cumulative distribution function of Y. Uniformly for all \(\varvec{\beta }\in \mathcal {B}\), we have

$$\begin{aligned}&\Phi (\varvec{\beta };\widehat{\mathbf{m}}(;\varvec{\beta }))-\Phi (\varvec{\beta };\mathbf{m}(;\varvec{\beta }))\nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) (\tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\right. \right. \nonumber \\&\qquad \left. \left. -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\right) \mathbf{Z}\right) (\tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \nonumber \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =o_p(1), \end{aligned}$$
(13)

using Proposition 1. Furthermore, by the Glivenko–Cantelli Theorem (\(\mathcal {F}\) is Donsker implies it is Glivenko–Cantelli), \(\sup _{f\in \mathcal {F}}(P_n-P)f=o_p(1)\). Thus uniformly for \(\varvec{\beta }\in \mathcal {B}\),

$$\begin{aligned} \Vert \Phi _n(\varvec{\beta };\widehat{\mathbf{m}}(.;\varvec{\beta }))-\Phi (\varvec{\beta };\widehat{\mathbf{m}}(.;\varvec{\beta })\Vert =o_p(1). \end{aligned}$$
(14)

Thus

$$\begin{aligned}&\Vert \Phi (\widehat{\varvec{\beta }};\mathbf{m}(.;\widehat{\varvec{\beta }})\Vert \\&\quad =\Vert \Phi (\widehat{\varvec{\beta }};\widehat{\mathbf{m}}(.;\widehat{\varvec{\beta }})\Vert +o_p(1)\\&\quad = \Vert \Phi _n(\widehat{\varvec{\beta }};\widehat{\mathbf{m}}(.;\widehat{\varvec{\beta }})\Vert +o_p(1)\\&\quad \le \Vert \Phi _n(\varvec{\beta }_0;\widehat{\mathbf{m}}(.;\varvec{\beta }_0)\Vert +o_p(1)\\&\quad =\Vert \Phi (\varvec{\beta }_0;\widehat{\mathbf{m}}(.;\varvec{\beta }_0)\Vert +o_p(1)\\&\quad =\Vert \Phi (\varvec{\beta }_0;\mathbf{m}(.;\varvec{\beta }_0)\Vert +o_p(1), \end{aligned}$$

where the first and the last equality used (13), the second and third equality used (14) and the inequality follows from the definition of \(\widehat{\varvec{\beta }}\) as an approximate minimizer of \(\Vert \Phi (\varvec{\beta },\widehat{\mathbf{m}}(.;\varvec{\beta }))\Vert \). Thus by assumption (A4), \(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert <\epsilon \) with probability approaching one, for any \(\epsilon >0\). This shows \(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\).

Now we consider asymptotic normality. For readability, we split the proof into several steps.

Step 1 By consistency, Lemma 19.24 in van der Vaart (1998) then implies that

$$\begin{aligned} {\mathbb {G}}_n(\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=o_p(1), \end{aligned}$$
(15)

where \({\mathbb {G}}_n=\sqrt{n}(P_n-P)\) is the empirical process.

Step 2 We show

$$\begin{aligned} \sqrt{n}P(\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=\varvec{\Psi }_1\sqrt{n}(\widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)})+o_p(\sqrt{n}(\widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}))+o_p(1). \end{aligned}$$
(16)

In the proof of (16), we write \(\widehat{\varvec{\beta }}\) as \(\varvec{\beta }\) for simplicity of notation. Writing \(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})+P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\), we first compute \(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})\) as follows.

$$\begin{aligned}&P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})\\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\widehat{\varvec{\beta }};\widehat{\varvec{\beta }})\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad = \mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( \tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad +\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad +\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \nonumber \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) .\nonumber \\ \end{aligned}$$

By Proposition 1, the first term above is \(o(n^{-1/2})\) and the second term is \(o_p(\Vert \varvec{\beta }-\varvec{\beta }_0\Vert )\). The third term is actually zero since \(F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) =\tau \). Finally, the last term is, by Taylor’s expansion

$$\begin{aligned}&\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \\&\qquad \times \left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \\&\qquad +O_p\left( \left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) ^2\right) . \end{aligned}$$

The first term above is zero since \(\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) is the projection of \(\mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\) onto \(\mathcal {M}_{\varvec{\beta }}\) while \(\widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\in \mathcal {M}_{\varvec{\beta }}\). The second term above is \(o_p(n^{-1/2})\) by Proposition 1.

Now we compute \(P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\). We have

$$\begin{aligned}&E[\phi _{ {\varvec{\beta }},\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}}|\mathbf{X}]\\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}\right) -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \\&\qquad f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\right) \mathbf{Z}+o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad \left. \frac{d\mathbf{g}^{\mathrm{T}}}{d\varvec{\beta }}\right. \mathbf{Z}( \varvec{\beta }-\varvec{\beta }_0)+o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\left( \mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) ^{\otimes 2}f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \left( \varvec{\beta }^{(-1)}-\varvec{\beta }_0^{(-1)}\right) \\&\qquad +o_p\left( \Vert \varvec{\beta }-\varvec{\beta }_0\Vert \right) +o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\left( \mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}\right) \right) ^{\otimes 2}f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad \times \left( \varvec{\beta }^{(-1)}-\varvec{\beta }_0^{(-1)}\right) +o_p\left( \Vert \varvec{\beta }-\varvec{\beta }_0\Vert \right) +o_p\left( n^{-1/2}\right) , \end{aligned}$$

where in the second to last line, we used the identity

$$\begin{aligned} \frac{d \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })}{d \varvec{\beta }}\mathbf{Z}=\mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-E\left[ \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })|\mathcal {M}_{\varvec{\beta }}\right] , \end{aligned}$$

as derived in the previous subsection.

Step 3 Let \(\widetilde{\varvec{\beta }}^{(-1)}=\varvec{\beta }_0^{(-1)}-\varvec{\Psi }_1^{-1}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}\), we have

$$\begin{aligned} {\mathbb {G}}_n\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =o_p(1). \end{aligned}$$

In fact, it is easy to see by central limit theorem that \(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =O_p\left( n^{-1/2}\right) \) and the proof is similar to Step 1 (actually only \(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\) is needed here).

Step 4 \(\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}=o_p(1)\).

Rewriting the result in Step 3 as

$$\begin{aligned} \sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}=\sqrt{n}P\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) +\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1), \end{aligned}$$

using the same arguments as in Step 2, we have

$$\begin{aligned} \sqrt{n}P\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =\varvec{\Psi }_1\sqrt{n}(\widetilde{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)})+o_p(1), \end{aligned}$$

and thus

$$\begin{aligned} \sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}= \varvec{\Psi }_1\sqrt{n}\left( \widetilde{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}\right) +\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1)=o_p(1), \end{aligned}$$

by the definition of \(\widetilde{\varvec{\beta }}\).

Step 5 Finish the proof. Since \(\widehat{\varvec{\beta }}\) minimizes \(\Vert \sqrt{n}P_n\phi _{\varvec{\beta },\widehat{\mathbf{m}}}\Vert \) (up to an \(o_p(1)\) term), we have \(|\sqrt{n}P_n\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}|\le |\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}|+o_p(1)=o_p(1)\).

Thus (15) can be rewritten as

$$\begin{aligned} \sqrt{n}P\left( \phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1). \end{aligned}$$
(17)

Using the result in Step 2 on the left-hand side of (17), we have

$$\begin{aligned} \sqrt{n}\left( \widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}\right) =\varvec{\Psi }_1^{-1}\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p\left( \sqrt{n}\left( \widehat{\varvec{\beta }}-\varvec{\beta }_0\right) \right) +o_p(1). \end{aligned}$$

This implies root-n consistency of \(\widehat{\varvec{\beta }}^{(-1)}\) as well as the asymptotic normality.\(\square \)

Proposition 1

For \(\alpha >5/2\), \(\alpha '>1/2\),

$$\begin{aligned}&\sup _{\varvec{\beta }\in \mathcal {B}} \left\| \widehat{\mathbf{g}}(.;\varvec{\beta })-\mathbf{g}(.;\varvec{\beta })\right\| =O_p\left( n^{-\alpha /(2\alpha +1)}(\mathrm{\log } n)^{1/2}\right) ,\\&\sup _{\varvec{\beta }\in \mathcal {B}} \left\| \widehat{\mathbf{g}}'(.;\varvec{\beta })-\mathbf{g}'(.;\varvec{\beta })\right\| =O_p\left( n^{-(\alpha -1)/(2\alpha +1)}(\mathrm{\log } n)^{1/2}\right) , \end{aligned}$$

and

$$\begin{aligned} \sup _{\varvec{\beta }\in \mathcal {B}} \left\| \widehat{\mathbf{H}}(.;\varvec{\beta })-\mathbf{H}(.;\varvec{\beta })\right\| =O_p\left( \max \left\{ n^{-\alpha '/(2\alpha '+1)},n^{-(\alpha -1)/(2\alpha +1)}\right\} (\log n)^{1/2}\right) . \end{aligned}$$

In particular, all rates above are of order \(o_p(n^{-1/4})\). The term \((\log n)^{1/2}\) can be roughly regarded as the cost of taking supremum over \(\varvec{\beta }\in \mathcal {B}\).

Proof

For illustration, we will only prove the first rate and the second is easily derived by the first. The third rate is also easier since \(\widehat{\mathbf{H}}\) is obtained from minimizing a smooth weighted least square (unlike quantile regression). If \(\mathbf{g}'\) were known, we would have the standard nonparametric rate \(n^{-\alpha '/(2\alpha '+1)}\) for \(\widehat{\mathbf{H}}\) (up to a logarithmic term). The other term is due to that \(\mathbf{g}'\) is estimated by \(\widehat{\mathbf{g}}'\), contributing a term of \(O_p(n^{-(\alpha -1)/(2\alpha +1)}\}(\log n)^{1/2})\). Since the arguments are standard, the details for the rates of \(\widehat{\mathbf{H}}\) are thus omitted.

Now we set out to show the uniform convergence rate of \(\widehat{\mathbf{g}}\). Note that \(\widehat{\mathbf{g}}(u,\varvec{\beta })=\widehat{\varvec{\Theta }}^{\mathrm{T}}\mathbf{B}(u)\) where \(\widehat{\varvec{\Theta }}=\widehat{\varvec{\Theta }}(\varvec{\beta })\) is the minimizer of

$$\begin{aligned} \min _{\varvec{\Theta }}\sum _i\rho _\tau \left( Y_i-\mathbf{B}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\varvec{\Theta }\mathbf{Z}_i\right) . \end{aligned}$$

Let \(\varvec{\Theta }_0=\varvec{\Theta }_0(\varvec{\beta })\) be such that \(\Vert \varvec{\Theta }_0^{\mathrm{T}}\mathbf{B}(.)-\mathbf{g}(.;\varvec{\beta })\Vert _\infty \le CK^{-d}.\) Define \(\varvec{\theta }=\mathrm{vec}(\varvec{\Theta })\), \(\widehat{\varvec{\theta }}=\mathrm{vec}(\widehat{\varvec{\Theta }})\), \(\varvec{\theta }_0(\varvec{\beta })=\mathrm{vec}(\varvec{\Theta }_0(\varvec{\beta }))\). Note that \(\mathbf{B}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\varvec{\Theta }\mathbf{Z}_i\) can also be written as \((\mathbf{Z}_i\otimes \mathbf{B}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }))^{\mathrm{T}}\varvec{\theta }\).

The general strategy of proof is similar to that in He and Shi (1994). However, besides that we have a single-index model instead of a simple univariate nonparametric regression, it turns out to be nontrivial to deal with supremum over \(\varvec{\beta }\) and this involves an important modification of the arguments used in He and Shi (1994). The proof of Proposition 1 is complete by combining Lemmas 13 and 4 below. \(\square \)

Define \(m_i(\varvec{\beta })=\mathbf{g}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) and \(e_i(\varvec{\beta })=Y_i-m_i(\varvec{\beta })\). Note that \(\tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \) does not have mean zero in general (unless \(\varvec{\beta }=\varvec{\beta }_0\)) but \(\mathbf{Z}\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \) still has mean zero, as in (4), which is sufficient for our purpose.

Lemma 1

Let \(r_n=\left( \sqrt{K/n}+K^{-\alpha }\right) (\log n)^{1/2}\).

$$\begin{aligned}&\sup _{\varvec{\beta }\in \mathcal {B},\Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert \le Cr_n} \sum _{i=1}^n\rho _\tau \left( Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }\right) \\&\qquad - \sum _{i=1}^n\rho _\tau \left( Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_{0}(\varvec{\beta })\right) \\&\qquad +\sum _{i=1}^n\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_{0}(\varvec{\beta })\right) \left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \\&\qquad -E\sum _{i=1}^n\rho _\tau \left( Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }\right) \\&\qquad +E\sum _{i=1}^n\rho _\tau \left( Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_{0}(\varvec{\beta })\right) =o_p\left( nr_n^2\right) , \end{aligned}$$

where the expectations are over \(Y_i\) conditional on \(\mathbf{X}_i\) (all expectations below are also such conditional expectations).

Proof

As in He and Shi (1994), in the proof we consider median regression with \(\tau =1/2\), \(\rho _\tau (u)=|u|/2\) and the general case can be shown in the same way. For any \(\varvec{\beta }\in \mathcal {B}\), let \(\mathcal {N}_{\varvec{\beta }}=\left\{ \varvec{\theta }^{(1)}(\varvec{\beta }),\ldots ,\varvec{\theta }^{(N)}(\varvec{\beta })\right\} \) be a \(\delta _n\) covering of \(\left\{ \varvec{\theta }: \Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert \le Cr_n\right\} \), with size bounded by \(N\le (C/\delta _n)^{CK}\) (see, for example, Lemma 2.5 of van der Geer (2000) for the bound) and thus \(\log N\le C K\log n\) if we choose \(\delta _n\sim n^{-a}\) for some \(a>0\) (we will choose a to be large enough). Let \((\varvec{\beta }^{(1)},\ldots ,\varvec{\beta }^{(N')})\) be a \(\delta _n\) covering of \(\mathcal {B}\) (it is well known that \(\log N'\le C\log n\)) and \(\mathcal {N}=\cup _{1\le j\le N'}\{\varvec{\beta }^{(j)}\}\times \mathcal {N}_{\varvec{\beta }^{(j)}}\). We denote all elements of \(\mathcal {N}\) by \((\varvec{\beta }_s,\varvec{\theta }_s), 1\le s\le S\) with \(S\le CK\log n\).

Define \(M_{ni}(\varvec{\beta },\varvec{\theta })=\frac{1}{2}|Y_i-(\mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }))^{\mathrm{T}}\varvec{\theta }|-\frac{1}{2}|Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_{0}(\varvec{\beta })|+\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \left( 1/2-I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \), and \(M_n(\varvec{\beta },\varvec{\theta })=\sum _{i=1}^nM_{ni}(\varvec{\beta },\varvec{\theta })\). Next we claim that for any \(\varvec{\beta }\) and any \(\varvec{\theta }\) with \(\Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert \le Cr_n\), there exists some \(\varvec{\beta }_s,\varvec{\theta }_s\in \mathcal {N}\) such that

$$\begin{aligned} |M_n(\varvec{\beta },\varvec{\theta })-EM_n(\varvec{\beta },\varvec{\theta })-M_n(\varvec{\beta }_s,\varvec{\theta }_s)+EM_n(\varvec{\beta }_s,\varvec{\theta }_s)|=o_p(nr_n^2). \end{aligned}$$
(18)

By the construction of \(\mathcal {N}\), it is obvious that we can find \((\varvec{\beta }_s,\varvec{\theta }_s)\in \mathcal {N}\) such that \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert +\Vert \varvec{\theta }-\varvec{\theta }_s\Vert \le C\delta _n\). In He and Shi (1994), the \(\varvec{\beta }\) in the indicator function \(I\{e_i(\varvec{\beta })\le 0\}\) in the definition of \(M_n(\varvec{\beta },\varvec{\beta })\) is actually \(\varvec{\beta }_0\), and \(M_n(\varvec{\beta },\varvec{\theta })\) is Lipschitz in \((\varvec{\beta },\varvec{\theta })\) and thus (18) is trivially satisfied when \(\delta _n\sim n^{-a}\) with a sufficiently large. Here proving (18) however is nontrivial since \(I\left\{ e_i(\varvec{\beta })\le 0\right\} \) is not continuous in \(\varvec{\beta }\). We deal with this term in Lemma 2 below. With the help of Lemma 2 dealing with the indicator function, and that all other terms in the definition of \(M_n(\varvec{\beta },\varvec{\theta })\) are Lipschitz continuous, we have that (18) holds.

By (18), we only need to show that

$$\begin{aligned} \sup _{(\varvec{\beta }_s,\varvec{\theta }_s)\in \mathcal {N}}|M_n(\varvec{\beta }_s,\varvec{\theta }_s)-EM_n(\varvec{\beta }_s,\varvec{\theta }_s)|=o_p(nr_n^2). \end{aligned}$$

By simple algebra

$$\begin{aligned} |M_{ni}(\varvec{\beta },\varvec{\theta })|= & {} \left| \frac{1}{2}|Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }|-\frac{1}{2}|Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })|\right. \\&\left. +\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \left( 1/2-I\left\{ e_i\le 0\right\} \right) \right| \\= & {} \Bigg | \frac{1}{2}|e_i(\varvec{\beta })+m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }|\\&-\frac{1}{2}|e_i(\varvec{\beta })+m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })|\\&+\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \left( 1/2-I\{e_i\le 0\}\right) \Bigg |\\\le & {} |(\mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }))^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0(\varvec{\beta }))|\cdot \\&\times I\{|e_i|\le |\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) |\\&+|m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })|\}. \end{aligned}$$

Thus

$$\begin{aligned} |M_{ni}(\varvec{\beta },\varvec{\theta })|\le & {} C\sqrt{K}r_n=: A, \end{aligned}$$

where we used that \(\Vert \mathbf{B}(x)\Vert \le C\sqrt{K}\) at any fixed point \(x\in [a,b]\).

Furthermore, we have

$$\begin{aligned} E|M_{ni}(\varvec{\beta },\varvec{\theta })|^2\le & {} C(\sqrt{K}r_n) E |\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) |^2\nonumber \\\le & {} C(\sqrt{K}r_n)(r_n^2)=:D^2. \end{aligned}$$
(19)

Using Bernstein’s inequality, together with union bound, we have

$$\begin{aligned} P\left( \sup _{(\varvec{\beta },\varvec{\theta })\in \mathcal {N}} |M_n(\varvec{\beta },\varvec{\theta })-EM_n(\varvec{\beta },\varvec{\theta })|>a\right) \le C\exp \left\{ -\frac{a^2}{aA+nD^2}-CK\log n\right\} . \end{aligned}$$

The right-hand side converges to zero with \(a=O\left( \max \left\{ K^{3/2}r_n\log n,\right. \right. \left. \left. \sqrt{nK^{3/2}r_n^3\log n}\right\} \right) =o\left( nr_n^2\right) \). \(\square \)

Lemma 2

$$\begin{aligned}&\sup _{1\le s\le S, \Vert \varvec{\beta }-\varvec{\beta }_s\Vert +\Vert \varvec{\theta }-\varvec{\theta }_s\Vert \le \delta _n} \Big |\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) I\left\{ e_i(\varvec{\beta })\le 0\right\} \\&\qquad -\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }_s)\right) ^{\mathrm{T}}\left( \varvec{\theta }_s-\varvec{\theta }_0(\varvec{\beta }_s)\right) I\left\{ e_i(\varvec{\beta }_s)\le 0\right\} \Big |\\&\quad =o_p\left( nr_n^2\right) . \end{aligned}$$

Proof

Writing \(W_i(\varvec{\beta },\varvec{\theta })=\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \), we only need to show that

$$\begin{aligned}&\sup _{1\le s\le S, \Vert \varvec{\beta }-\varvec{\beta }_s\Vert +\Vert \varvec{\theta }-\varvec{\theta }_s\Vert \le \delta _n}\left| \sum _i W_i(\varvec{\beta },\varvec{\theta }) ( I\left\{ e_i(\varvec{\beta })\le 0\right\} -F\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) \right. \nonumber \\&\qquad \left. -\sum _i W_i(\varvec{\beta }_s,\varvec{\theta }_s) \left( I\left\{ e_i(\varvec{\beta }_s)\le 0\right\} -F\left( m_i(\varvec{\beta }_s)|\mathbf{X}_i\right) \right) \right| \nonumber \\&\quad =o_p\left( nr_n^2\right) . \end{aligned}$$
(20)

Obviously (20) is implied by

$$\begin{aligned}&\sup _{1\le s\le S, \Vert \varvec{\beta }-\varvec{\beta }_s\Vert +\Vert \varvec{\theta }-\varvec{\theta }_s\Vert \le \delta _n} \left| \sum _i W_i(\varvec{\beta }_s,\varvec{\theta }_s) \left( I\left\{ e_i(\varvec{\beta })\le 0\right\} \right. \right. \nonumber \\&\qquad \left. \left. -I\left\{ e_i(\varvec{\beta }_s)\le 0\right\} -F\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) +F\left( m_i(\varvec{\beta }_s)|\mathbf{X}_i\right) \right) \right| \nonumber \\&\quad =o_p\left( nr_n^2\right) . \end{aligned}$$
(21)

Assume \(W_i(\varvec{\beta }_s,\varvec{\theta }_s)>0\) for now and we first show (21) without the absolute value on the left-hand side. Since \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert \le \delta _n\), by our assumption we have \(|e_i(\varvec{\beta })-e_i(\varvec{\beta }_s)|\le C\delta _n\). By the monotonicity of the function \(t\rightarrow I\left\{ e_i(\varvec{\beta })\le t\right\} \), the left-hand side of (21) is bounded by

$$\begin{aligned}&\sum _i W_i(\varvec{\beta }_s,\varvec{\theta }_s) \left( I\left\{ e_i(\varvec{\beta }_s)\le C\delta _n\right\} -I\left\{ e_i(\varvec{\beta }_s)\le 0\right\} \right. \nonumber \\&\left. \qquad -F\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) +F\left( m_i(\varvec{\beta }_s)|\mathbf{X}_i\right) \right) \nonumber \\&\quad =\sum _i W_i \left( \varvec{\beta }_s,\varvec{\theta }_s)( I\left\{ e_i(\varvec{\beta }_s)\le C\delta _n\right\} -I\left\{ e_i(\varvec{\beta }_s)\le 0\right\} \right. \nonumber \\&\left. \qquad -F\left( m_i(\varvec{\beta }_s)+C\delta _n|\mathbf{X}_i\right) +F\left( m_i(\varvec{\beta }_s)|\mathbf{X}_i\right) \right) \nonumber \\&\qquad +\sum _i W_i(\varvec{\beta }_s,\varvec{\theta }_s) \left( F\left( m_i(\varvec{\beta }_s)+C\delta _n|\mathbf{X}_i\right) -F\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) \right) . \end{aligned}$$
(22)

The first term of (22) is \(o_p\left( nr_n^2\right) \) which follows easily from Bernstein’s inequality, the union bound, and that \(\delta _n\sim n^{-a}\) for a sufficiently large. The second term of (22) is also \(o_p\left( nr_n^2\right) \) since \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert \le \delta _n\). Obviously, using \(I\left\{ e_i(\varvec{\beta })\le 0\right\} \ge I\left\{ e_i(\varvec{\beta }_s)\le -C\delta _n\right\} \), we can also show that (22) is \(o_p\left( nr_n^2\right) \) if we change the sign.

So far we have assumed that \(W_i(\varvec{\beta }_s,\varvec{\theta }_s)>0\). In general, we can consider the positive part and the negative part of \(W_i(\varvec{\beta }_s,\varvec{\theta }_s) \) separately and the proof is complete. \(\square \)

Lemma 3

For \(L>0\) large enough

$$\begin{aligned}&\inf _{\varvec{\beta }\in \mathcal {B},\Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert = Lr_n}\sum _iE\rho _\tau \left( e_i(\varvec{\beta })+m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }\right) \\&\qquad -\sum _iE\rho _\tau \left( e_i(\varvec{\beta })+m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })\right) \\&\quad \ge L^2 Cnr_n^2. \end{aligned}$$

Proof

Applying the Knight’s identity \(\rho _\tau (x-y)-\rho _\tau (x)=-y\left( \tau -I\{x\le 0\}\right) +\int _0^y \left( I\{x\le t\}-I\{x\le 0\}\right) dt\) twice on the two terms, we have that

$$\begin{aligned}&E\sum _{i=1}^n \rho _\tau \left( e_i(\varvec{\beta })+m_i(\varvec{\beta })- \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }\right) \\&\qquad -E\sum _{i=1}^n\rho _\tau \left( e_i(\varvec{\beta })+m_i(\varvec{\beta })-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })\right) \\&\quad =\sum _i\int _{\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })-m_i(\varvec{\beta })}^{\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }-m_i(\varvec{\beta })} F\left( m_i(\varvec{\beta })+t|\mathbf{X}_i\right) -F\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) dt\\&\quad \ge C\sum _i f\left( m_i(\varvec{\beta })|\mathbf{X}_i\right) \left[ \left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) ^2\right. \\&\qquad \left. +2\left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) \left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })-m_i(\varvec{\beta })\right) \right] . \end{aligned}$$

Combining

$$\begin{aligned} \sum _i \left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) ^2\ge CL^2nr_n^2, \end{aligned}$$

and

$$\begin{aligned}&\sum _i \left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) \left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_0(\varvec{\beta })-m_i(\varvec{\beta })\right) \\&\quad \le CLnr_nK^{-d}, \end{aligned}$$

we get the statement of the lemma if L is large enough. \(\square \)

Lemma 4

$$\begin{aligned}&\sup _{\varvec{\beta }\in \mathcal {B},\Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert =Lr_n}\sum _i \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0)\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \\&\quad =L\cdot O_p\left( nr_n^2\right) . \end{aligned}$$

Proof

By Lemma 2, we only need to consider supremum over \((\varvec{\beta }_s,\varvec{\theta }_s)\in \mathcal {N}\). For fixed \(\varvec{\beta }\) and \(\varvec{\theta }\), using \(\sum _i\left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) ^2=O_p\left( L^2nr_n^2\right) \), and \(|\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) |\le CLr_n\sqrt{K}\), we have, by Bernstein’s inequality,

$$\begin{aligned}&P\left( \sum _i \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0)\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) >a\right\} \\&\quad \le C\exp \left\{ -\frac{a^2}{aL\sqrt{K}r_n+nL^2r_n^2}\right\} . \end{aligned}$$

Thus

$$\begin{aligned}&P\left( \sup _{(\varvec{\beta },\varvec{\theta })\in \mathcal {N}}\sum _i \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0)\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) >a\right\} \\&\quad \le C\exp \left\{ -\frac{a^2}{aL\sqrt{K}r_n+nL^2r_n^2}-CK\log n\right\} , \end{aligned}$$

which implies \(\sup _{(\varvec{\beta },\varvec{\theta })\in \mathcal {N}}\sum _i \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0)(\tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} )=LO_p(\sqrt{n}r_n\sqrt{K\log n})=LO_p\left( nr_n^2\right) \). \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, W., Li, J. & Lian, H. Adaptive varying-coefficient linear quantile model: a profiled estimating equations approach. Ann Inst Stat Math 70, 553–582 (2018). https://doi.org/10.1007/s10463-017-0599-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-017-0599-8

Keywords

Navigation