Abstract
We consider an estimating equations approach to parameter estimation in adaptive varying-coefficient linear quantile model. We propose estimating equations for the index vector of the model in which the unknown nonparametric functions are estimated by minimizing the check loss function, resulting in a profiled approach. The estimating equations have a bias-corrected form that makes undersmoothing of the nonparametric part unnecessary. The estimating equations approach makes it possible to obtain the estimates using a simple fixed-point algorithm. We establish asymptotic properties of the estimator using empirical process theory, with additional complication due to the nuisance nonparametric part. The finite sample performance of the new model is illustrated using simulation studies and a forest fire dataset.
Similar content being viewed by others
References
Belloni, A., Chernozhukov, V. (2011). l1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.
Bondell, H. D., Reich, B. J., Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika, 97, 825–838.
Cai, Z., Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425.
Carroll, R. J., Fan, J., Gijbels, I., Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477–489.
Chen, R., Tsay, R. S. (1993). Functional-coefficient autoregressive models. Journal of the American Statistical Association, 88, 298–308.
Cortez, P., Morais, A. (2007). A data mining approach to predict forest fires using meteorological data. In J. Neves, M. F. Santos and J. Machado (Eds.), New trends in artificial intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, 512–523.
Cui, X., Haerdle, W. K., Zhu, L. (2011). The efm approach for single-index models. The Annals of Statistics, 39, 1658–1688.
Fan, J., Fan, Y., Barut, E. (2014a). Adaptive robust variable selection. Annals of Statistics, 42, 324–351.
Fan, J., Ma, Y., Dai, W. (2014b). Nonparametric independence screening in sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284.
Fan, J., Yao, Q., Cai, Z. (2003). Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 57–80.
Fan, J. Q., Zhang, J. T. (2000). Two-step estimation of functional linear models with applications to longitudinal data. Journal of the Royal Statistical Society Series B-Statistical Methodology, 62, 303–322.
Fan, J. Q., Zhang, W. Y. (1999). Statistical estimation in varying coefficient models. Annals of Statistics, 27, 1491–1518.
Hall, P., Sheather, S. J. (1988). On the distribution of a studentized quantile. Journal of the Royal Statistical Society: Series B (Methodological), 50, 381–391.
Hastie, T., Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society Series B-Methodological, 55, 757–796.
He, X., Shi, P. (1994). Convergence rate of b-spline estimators of nonparametric conditional quantile functions. Journal of Nonparametric Statistics, 3, 299–308.
Hendricks, W., Koenker, R. (1992). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association, 87, 58–68.
Hoover, D. R., Rice, J. A., Wu, C. O., Yang, L. P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika, 85, 809–822.
Horowitz, J. L., Lee, S. (2005). Nonparametric estimation of an additive quantile regression model. Journal of the American Statistical Association, 100, 1238–1249.
Hu, Y., Gramacy, R., Lian, H. (2013). Bayesian quantile regression for single-index models. Statistics and Computing, 23, 437–454.
Huang, J. H. Z., Wu, C. O., Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89, 111–128.
Jiang, L., Wang, H. J., Bondell, H. D. (2013). Interquantile shrinkage in regression models. Journal of Computational and Graphical Statistics, 22, 970–986.
Kim, M. (2007). Quantile regression with varying coefficients. Annals of Statistics, 35, 92–108.
Koenker, R., Bassett, G, Jr. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society, 1, 33–50.
Koenker, R., Ng, P., Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81, 673–680.
Kong, E., Xia, Y. (2012). A single-index quantile regression model and its estimation. Econometric Theory, 28, 730–768.
Kottas, A., Krnjajic, M. (2009). Bayesian semiparametric modelling in quantile regression. Scandinavian Journal of Statistics, 36, 297–319.
Lai, P., Wang, Q., Lian, H. (2012). Bias-corrected gee estimation and smooth-threshold gee variable selection for single-index models with clustered data. Journal of Multivariate Analysis, 105, 422–432.
Lee, S. (2003). Efficient semiparametric estimation of a partially linear quantile regression model. Econometric Theory, 19, 1–31.
Lian, H. (2012a). Semiparametric estimation of additive quantile regression models by two-fold penalty. Journal of Business & Economic Statistics, 30, 337–350.
Lian, H. (2012b). Variable selection for high-dimensional generalized varying-coefficient models. Statistica Sinica, 22, 1563–1588.
Liu, J., Li, R., Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109, 266–274.
Lu, Z., Tjstheim, D., Yao, Q. (2007). Adaptive varying-coefficient linear models for stochastic processes: asymptotic theory. Statistica Sinica, 17, 177–198.
Reich, B., Bondell, H., Wang, H. (2010). Flexible bayesian quantile regression for independent and clustered data. Biostatistics, 11, 337–352.
Sherwood, B., Wang, L. (2016). Partially linear additive quantile regression in ultra-high dimension. Annals of Statistics, 44, 288–317.
Tang, Y., Wang, H. J., Zhu, Z. (2013). Variable selection in quantile varying coefficient models with longitudinal data. Computational Statistics & Data Analysis, 57, 435–449.
Tokdar, S., Kadane, J. B. (2011). Simultaneous linear quantile regression: A semiparametric bayesian approach. Bayesian Analysis, 6, 1–22.
van der Geer, S. A. (2000). Empirical processes in M-estimation. Cambridge: Cambridge University Press.
van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.
van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.
Wang, H. J., Zhu, Z., Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. The Annals of Statistics, 37, 3841–3866.
Wang, J.-L., Xue, L., Zhu, L., Chong, Y. S. (2010). Estimation for a partial-linear single-index model. Annals of Statistics, 38, 246–274.
Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222.
Wei, F., Huang, J., Li, H. Z. (2011). Variable selection and estimation in high-dimensional varying-coefficient models. Statistica Sinica, 21, 1515–1540.
Wu, T. Z., Yu, K., Yu, Y. (2010). Single-index quantile regression. Journal of Multivariate Analysis, 101, 1607–1621.
Wu, Y., Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica, 19, 801–817.
Xia, Y., Li, W. (1999). On single-index coefficient regression models. Journal of the American Statistical Association, 94, 1275–1285.
Xue, L., Qu, A. (2012). Variable selection in high-dimensional varying-coefficient models with global optimality. The Journal of Machine Learning Research, 13, 1973–1998.
Yang, Y., He, X. (2012). Bayesian empirical likelihood for quantile regression. The Annals of Statistics, 40, 1102–1131.
Yu, K., Jones, M. (1998). Local linear quantile regression. Journal of the American Statistical Association, 93, 228–237.
Yu, K., Moyeed, R. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54, 437–447.
Yu, Y., Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97, 1042–1054.
Zhu, L., Huang, M., Li, R. (2012). Semiparametric quantile regression with high-dimensional covariates. Statistica Sinica, 22, 1379–1401.
Zhu, L., Lin, L., Cui, X., Li, G. (2010). Bias-corrected empirical likelihood in a multi-link semiparametric model. Journal of Multivariate Analysis, 101, 850–868.
Acknowledgements
We sincerely thank the AE and two anonymous reviewers for their insightful comments which have led to significant improvement of the paper. The research of Zhao was supported in part by National Social Science Fund of China (15BTJ027), and the research of Lian was supported by a start up Grant (No. 7200521/MA) from the City University of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Appendix: Technical proofs
Appendix: Technical proofs
In our proofs, C denotes a generic positive constant which can take different values even on the same line.
Proof of Theorem 1
Consider the class of functions
For \(\alpha >1/2\), define
For any fixed \(\varvec{\beta }\), the class \(\mathcal {F}_1(\varvec{\beta }):=\{h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }): h\in \mathcal {C}^\alpha (R) \}\) has entropy \(\log N(\delta ,\mathcal {C}^{\alpha }(M),\Vert .\Vert _\infty )\le C\delta ^{-1/\alpha }\) by Theorem 2.7.1 of van der Vaart and Wellner (1996). Since \(\Vert h(\mathbf{x}^{\mathrm{T}}\varvec{\beta })-h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }')\Vert _\infty \le C\Vert \varvec{\beta }-\varvec{\beta }\Vert ^{s}\) for some \(s>1/2\), it is easy to see that the \(\delta \)-entropy of \(\mathcal {F}_1\) in \(L_\infty \) norm is bounded by the sum of \(C\delta ^{-1/\alpha }\) and the \(\delta ^{1/s}\)-entropy of \(\mathcal {B}\) in Euclidean norm, with the latter being \(C\log (1/\delta )\). Thus by Theorem 19.14 of van der Vaart (1998) and that \(\mathbf{x}\) lies in a compact set, \(\mathcal {F}_1\) is a Donsker class.
Furthermore, consider
We have
Thus \(\log N(\delta ,\mathcal {F}_2,L_2)\le C\log N(C\delta ^2,\mathcal {F}_1,L_2)\le C\delta ^{-2/\alpha }\) and \(\mathcal {F}_2\) is Donsker if \(\alpha >1\). Combining that \(\mathcal {F}_1\) and \(\mathcal {F}_2\) are Donsker classes, it is easy to see that \(\mathcal {F}\) is also a Donsker class.
First we prove consistency. Let \(F(.|\mathbf{X})\) be the conditional cumulative distribution function of Y. Uniformly for all \(\varvec{\beta }\in \mathcal {B}\), we have
using Proposition 1. Furthermore, by the Glivenko–Cantelli Theorem (\(\mathcal {F}\) is Donsker implies it is Glivenko–Cantelli), \(\sup _{f\in \mathcal {F}}(P_n-P)f=o_p(1)\). Thus uniformly for \(\varvec{\beta }\in \mathcal {B}\),
Thus
where the first and the last equality used (13), the second and third equality used (14) and the inequality follows from the definition of \(\widehat{\varvec{\beta }}\) as an approximate minimizer of \(\Vert \Phi (\varvec{\beta },\widehat{\mathbf{m}}(.;\varvec{\beta }))\Vert \). Thus by assumption (A4), \(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert <\epsilon \) with probability approaching one, for any \(\epsilon >0\). This shows \(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\).
Now we consider asymptotic normality. For readability, we split the proof into several steps.
Step 1 By consistency, Lemma 19.24 in van der Vaart (1998) then implies that
where \({\mathbb {G}}_n=\sqrt{n}(P_n-P)\) is the empirical process.
Step 2 We show
In the proof of (16), we write \(\widehat{\varvec{\beta }}\) as \(\varvec{\beta }\) for simplicity of notation. Writing \(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})+P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\), we first compute \(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})\) as follows.
By Proposition 1, the first term above is \(o(n^{-1/2})\) and the second term is \(o_p(\Vert \varvec{\beta }-\varvec{\beta }_0\Vert )\). The third term is actually zero since \(F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) =\tau \). Finally, the last term is, by Taylor’s expansion
The first term above is zero since \(\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) is the projection of \(\mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\) onto \(\mathcal {M}_{\varvec{\beta }}\) while \(\widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\in \mathcal {M}_{\varvec{\beta }}\). The second term above is \(o_p(n^{-1/2})\) by Proposition 1.
Now we compute \(P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\). We have
where in the second to last line, we used the identity
as derived in the previous subsection.
Step 3 Let \(\widetilde{\varvec{\beta }}^{(-1)}=\varvec{\beta }_0^{(-1)}-\varvec{\Psi }_1^{-1}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}\), we have
In fact, it is easy to see by central limit theorem that \(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =O_p\left( n^{-1/2}\right) \) and the proof is similar to Step 1 (actually only \(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\) is needed here).
Step 4 \(\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}=o_p(1)\).
Rewriting the result in Step 3 as
using the same arguments as in Step 2, we have
and thus
by the definition of \(\widetilde{\varvec{\beta }}\).
Step 5 Finish the proof. Since \(\widehat{\varvec{\beta }}\) minimizes \(\Vert \sqrt{n}P_n\phi _{\varvec{\beta },\widehat{\mathbf{m}}}\Vert \) (up to an \(o_p(1)\) term), we have \(|\sqrt{n}P_n\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}|\le |\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}|+o_p(1)=o_p(1)\).
Thus (15) can be rewritten as
Using the result in Step 2 on the left-hand side of (17), we have
This implies root-n consistency of \(\widehat{\varvec{\beta }}^{(-1)}\) as well as the asymptotic normality.\(\square \)
Proposition 1
For \(\alpha >5/2\), \(\alpha '>1/2\),
and
In particular, all rates above are of order \(o_p(n^{-1/4})\). The term \((\log n)^{1/2}\) can be roughly regarded as the cost of taking supremum over \(\varvec{\beta }\in \mathcal {B}\).
Proof
For illustration, we will only prove the first rate and the second is easily derived by the first. The third rate is also easier since \(\widehat{\mathbf{H}}\) is obtained from minimizing a smooth weighted least square (unlike quantile regression). If \(\mathbf{g}'\) were known, we would have the standard nonparametric rate \(n^{-\alpha '/(2\alpha '+1)}\) for \(\widehat{\mathbf{H}}\) (up to a logarithmic term). The other term is due to that \(\mathbf{g}'\) is estimated by \(\widehat{\mathbf{g}}'\), contributing a term of \(O_p(n^{-(\alpha -1)/(2\alpha +1)}\}(\log n)^{1/2})\). Since the arguments are standard, the details for the rates of \(\widehat{\mathbf{H}}\) are thus omitted.
Now we set out to show the uniform convergence rate of \(\widehat{\mathbf{g}}\). Note that \(\widehat{\mathbf{g}}(u,\varvec{\beta })=\widehat{\varvec{\Theta }}^{\mathrm{T}}\mathbf{B}(u)\) where \(\widehat{\varvec{\Theta }}=\widehat{\varvec{\Theta }}(\varvec{\beta })\) is the minimizer of
Let \(\varvec{\Theta }_0=\varvec{\Theta }_0(\varvec{\beta })\) be such that \(\Vert \varvec{\Theta }_0^{\mathrm{T}}\mathbf{B}(.)-\mathbf{g}(.;\varvec{\beta })\Vert _\infty \le CK^{-d}.\) Define \(\varvec{\theta }=\mathrm{vec}(\varvec{\Theta })\), \(\widehat{\varvec{\theta }}=\mathrm{vec}(\widehat{\varvec{\Theta }})\), \(\varvec{\theta }_0(\varvec{\beta })=\mathrm{vec}(\varvec{\Theta }_0(\varvec{\beta }))\). Note that \(\mathbf{B}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\varvec{\Theta }\mathbf{Z}_i\) can also be written as \((\mathbf{Z}_i\otimes \mathbf{B}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }))^{\mathrm{T}}\varvec{\theta }\).
The general strategy of proof is similar to that in He and Shi (1994). However, besides that we have a single-index model instead of a simple univariate nonparametric regression, it turns out to be nontrivial to deal with supremum over \(\varvec{\beta }\) and this involves an important modification of the arguments used in He and Shi (1994). The proof of Proposition 1 is complete by combining Lemmas 1, 3 and 4 below. \(\square \)
Define \(m_i(\varvec{\beta })=\mathbf{g}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) and \(e_i(\varvec{\beta })=Y_i-m_i(\varvec{\beta })\). Note that \(\tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \) does not have mean zero in general (unless \(\varvec{\beta }=\varvec{\beta }_0\)) but \(\mathbf{Z}\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \) still has mean zero, as in (4), which is sufficient for our purpose.
Lemma 1
Let \(r_n=\left( \sqrt{K/n}+K^{-\alpha }\right) (\log n)^{1/2}\).
where the expectations are over \(Y_i\) conditional on \(\mathbf{X}_i\) (all expectations below are also such conditional expectations).
Proof
As in He and Shi (1994), in the proof we consider median regression with \(\tau =1/2\), \(\rho _\tau (u)=|u|/2\) and the general case can be shown in the same way. For any \(\varvec{\beta }\in \mathcal {B}\), let \(\mathcal {N}_{\varvec{\beta }}=\left\{ \varvec{\theta }^{(1)}(\varvec{\beta }),\ldots ,\varvec{\theta }^{(N)}(\varvec{\beta })\right\} \) be a \(\delta _n\) covering of \(\left\{ \varvec{\theta }: \Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert \le Cr_n\right\} \), with size bounded by \(N\le (C/\delta _n)^{CK}\) (see, for example, Lemma 2.5 of van der Geer (2000) for the bound) and thus \(\log N\le C K\log n\) if we choose \(\delta _n\sim n^{-a}\) for some \(a>0\) (we will choose a to be large enough). Let \((\varvec{\beta }^{(1)},\ldots ,\varvec{\beta }^{(N')})\) be a \(\delta _n\) covering of \(\mathcal {B}\) (it is well known that \(\log N'\le C\log n\)) and \(\mathcal {N}=\cup _{1\le j\le N'}\{\varvec{\beta }^{(j)}\}\times \mathcal {N}_{\varvec{\beta }^{(j)}}\). We denote all elements of \(\mathcal {N}\) by \((\varvec{\beta }_s,\varvec{\theta }_s), 1\le s\le S\) with \(S\le CK\log n\).
Define \(M_{ni}(\varvec{\beta },\varvec{\theta })=\frac{1}{2}|Y_i-(\mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta }))^{\mathrm{T}}\varvec{\theta }|-\frac{1}{2}|Y_i-\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\varvec{\theta }_{0}(\varvec{\beta })|+\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \left( 1/2-I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \), and \(M_n(\varvec{\beta },\varvec{\theta })=\sum _{i=1}^nM_{ni}(\varvec{\beta },\varvec{\theta })\). Next we claim that for any \(\varvec{\beta }\) and any \(\varvec{\theta }\) with \(\Vert \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\Vert \le Cr_n\), there exists some \(\varvec{\beta }_s,\varvec{\theta }_s\in \mathcal {N}\) such that
By the construction of \(\mathcal {N}\), it is obvious that we can find \((\varvec{\beta }_s,\varvec{\theta }_s)\in \mathcal {N}\) such that \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert +\Vert \varvec{\theta }-\varvec{\theta }_s\Vert \le C\delta _n\). In He and Shi (1994), the \(\varvec{\beta }\) in the indicator function \(I\{e_i(\varvec{\beta })\le 0\}\) in the definition of \(M_n(\varvec{\beta },\varvec{\beta })\) is actually \(\varvec{\beta }_0\), and \(M_n(\varvec{\beta },\varvec{\theta })\) is Lipschitz in \((\varvec{\beta },\varvec{\theta })\) and thus (18) is trivially satisfied when \(\delta _n\sim n^{-a}\) with a sufficiently large. Here proving (18) however is nontrivial since \(I\left\{ e_i(\varvec{\beta })\le 0\right\} \) is not continuous in \(\varvec{\beta }\). We deal with this term in Lemma 2 below. With the help of Lemma 2 dealing with the indicator function, and that all other terms in the definition of \(M_n(\varvec{\beta },\varvec{\theta })\) are Lipschitz continuous, we have that (18) holds.
By (18), we only need to show that
By simple algebra
Thus
where we used that \(\Vert \mathbf{B}(x)\Vert \le C\sqrt{K}\) at any fixed point \(x\in [a,b]\).
Furthermore, we have
Using Bernstein’s inequality, together with union bound, we have
The right-hand side converges to zero with \(a=O\left( \max \left\{ K^{3/2}r_n\log n,\right. \right. \left. \left. \sqrt{nK^{3/2}r_n^3\log n}\right\} \right) =o\left( nr_n^2\right) \). \(\square \)
Lemma 2
Proof
Writing \(W_i(\varvec{\beta },\varvec{\theta })=\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \), we only need to show that
Obviously (20) is implied by
Assume \(W_i(\varvec{\beta }_s,\varvec{\theta }_s)>0\) for now and we first show (21) without the absolute value on the left-hand side. Since \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert \le \delta _n\), by our assumption we have \(|e_i(\varvec{\beta })-e_i(\varvec{\beta }_s)|\le C\delta _n\). By the monotonicity of the function \(t\rightarrow I\left\{ e_i(\varvec{\beta })\le t\right\} \), the left-hand side of (21) is bounded by
The first term of (22) is \(o_p\left( nr_n^2\right) \) which follows easily from Bernstein’s inequality, the union bound, and that \(\delta _n\sim n^{-a}\) for a sufficiently large. The second term of (22) is also \(o_p\left( nr_n^2\right) \) since \(\Vert \varvec{\beta }-\varvec{\beta }_s\Vert \le \delta _n\). Obviously, using \(I\left\{ e_i(\varvec{\beta })\le 0\right\} \ge I\left\{ e_i(\varvec{\beta }_s)\le -C\delta _n\right\} \), we can also show that (22) is \(o_p\left( nr_n^2\right) \) if we change the sign.
So far we have assumed that \(W_i(\varvec{\beta }_s,\varvec{\theta }_s)>0\). In general, we can consider the positive part and the negative part of \(W_i(\varvec{\beta }_s,\varvec{\theta }_s) \) separately and the proof is complete. \(\square \)
Lemma 3
For \(L>0\) large enough
Proof
Applying the Knight’s identity \(\rho _\tau (x-y)-\rho _\tau (x)=-y\left( \tau -I\{x\le 0\}\right) +\int _0^y \left( I\{x\le t\}-I\{x\le 0\}\right) dt\) twice on the two terms, we have that
Combining
and
we get the statement of the lemma if L is large enough. \(\square \)
Lemma 4
Proof
By Lemma 2, we only need to consider supremum over \((\varvec{\beta }_s,\varvec{\theta }_s)\in \mathcal {N}\). For fixed \(\varvec{\beta }\) and \(\varvec{\theta }\), using \(\sum _i\left( \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) \right) ^2=O_p\left( L^2nr_n^2\right) \), and \(|\left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}\left( \varvec{\theta }-\varvec{\theta }_0(\varvec{\beta })\right) |\le CLr_n\sqrt{K}\), we have, by Bernstein’s inequality,
Thus
which implies \(\sup _{(\varvec{\beta },\varvec{\theta })\in \mathcal {N}}\sum _i \left( \mathbf{Z}_i\otimes \mathbf{B}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta })\right) ^{\mathrm{T}}(\varvec{\theta }-\varvec{\theta }_0)(\tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} )=LO_p(\sqrt{n}r_n\sqrt{K\log n})=LO_p\left( nr_n^2\right) \). \(\square \)
About this article
Cite this article
Zhao, W., Li, J. & Lian, H. Adaptive varying-coefficient linear quantile model: a profiled estimating equations approach. Ann Inst Stat Math 70, 553–582 (2018). https://doi.org/10.1007/s10463-017-0599-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-017-0599-8