Skip to main content
Log in

Model averaging for semiparametric varying coefficient quantile regression models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this study, we propose a model averaging approach to estimating the conditional quantiles based on a set of semiparametric varying coefficient models. Different from existing literature on the subject, we consider a particular form for all candidates, where there is only one varying coefficient in each sub-model, and all the candidates under investigation may be misspecified. We propose a weight choice criterion based on a leave-more-out cross-validation objective function. Moreover, the resulting averaging estimator is more robust against model misspecification due to the weighted coefficients that adjust the relative importance of the varying and constant coefficients for the same predictors. We prove out statistical properties for each sub-model and asymptotic optimality of the weight selection method. Simulation studies show that the proposed procedure has satisfactory prediction accuracy. An analysis of a skin cutaneous melanoma data further supports the merits of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Angrist, J., Chernozhukov, V., Fernández-Val, I. (2006). Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrica, 74, 539–563.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, Z., Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, Z., Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association, 103, 1595–1608.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, Z., Chen, L., Fang, Y. (2018). A semiparametric quantile panel data model with an application to estimating the growth effect of FDI. Journal of Econometrics, 206, 531–553.

    Article  MathSciNet  MATH  Google Scholar 

  • Chai, H., Shi, X., Zhang, Q., Zhao, Q., Huang, Y., Ma, S. (2017). Analysis of cancer gene expression data with an assisted robust marker identification approach. Genetic Epidemiology, 41, 779–789.

    Article  Google Scholar 

  • Fitzenberger, B., Koenker, R., Machado, J. (Eds.). (2002). Economic application of quantile regression. Heidelberg, Germany: Physica Verlag.

    Google Scholar 

  • Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75, 1175–1189.

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort, N. L., Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.

    Article  MathSciNet  MATH  Google Scholar 

  • Kai, B., Li, R., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332.

    Article  MathSciNet  MATH  Google Scholar 

  • Knight, K. (1998). Limiting distributions for \(L_1\) regression estimators under general conditions. Annals of Statistics, 26, 755–770.

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

    Article  MathSciNet  MATH  Google Scholar 

  • Kuester, K., Mittnik, S., Paolella, M. (2006). Value-at-risk prediction: A comparison of alternative strategies. Journal of Financial Econometrics, 4, 53–89.

    Article  MATH  Google Scholar 

  • Li, D., Linton, O., Lu, Z. (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics, 187, 345–357.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, G., Li, Y., Tsai, C. L. (2015). Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association, 110, 246–261.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, J., Xia, X., Wong, W. K., Nott, D. (2018). Varying-coefficient semiparametric model averaging prediction. Biometrics, 74, 1417–1426.

    Article  MathSciNet  Google Scholar 

  • Li, X., Ma, X., Zhang, J. (2018). Conditional quantile correlation screening procedure for ultrahigh-dimensional varying coefficient models. Journal of Statistical Planning and Inference, 197, 62–92.

    Article  MathSciNet  MATH  Google Scholar 

  • Li, Y., Graubard, B. I., Korn, E. L. (2010). Application of nonparametric quantile regression to body mass index percentile curves from survey data. Statistics in Medicine, 29, 558–572.

    Article  MathSciNet  Google Scholar 

  • Lian, H. (2015). Quantile regression for dynamic partially linear varying coefficient time series models. Journal of Multivariate Analysis, 141, 49–66.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, H., Fei, Z., Li, Y. (2016). A semiparametrically efficient estimator of the time-varying effects for survival data with time-dependent treatment. Scandinavian Journal of Statistics, 43, 649–663.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, J., Huang, J., Zhang, Y., Lan, Q., Rothman, N., Zheng, T., Ma, S. (2013). Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102, 189–194.

    Article  Google Scholar 

  • Lu, X., Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188, 40–58.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, S., Yang, L., Romero, R., Cui, Y. (2011). Varying coefficient model for gene-environment interaction: A non-linear look. Bioinformatics, 27, 2119–2126.

    Article  Google Scholar 

  • Mack, Y., Silverman, B. (1982). Weak and strong uniform consistency of kernel regression estimates. Probability Theory Related Fields, 61, 405–415.

    MathSciNet  MATH  Google Scholar 

  • Nan, Y., Yang, Y. (2014). Variable selection diagnostics measures for high-dimensional regression. Journal of Computational and Graphical Statistics, 23, 636–656.

    Article  MathSciNet  Google Scholar 

  • Shan, K., Yang, Y. (2009). Combining Regression Quantile Estimators. Statistica Sinica, 19, 1171–1191.

    MathSciNet  MATH  Google Scholar 

  • Sharafeldin, N., Slattery, M. L., Liu, Q., Franco-Villalobos, C., Caan, B. J., Potter, J. D., Yasui, Y. (2015). A candidate-pathway approach to identify gene-environment interactions: Analyses of colon cancer risk and survival. Journal of the National Cancer Institute, 107(9), djv160.

  • Shen, Y., Liang, H. (2017). Quantile regression for partially linear varying-coefficient model with censoring indicators missing at random. Computational Statistics and Data Analysis, 117, 1–18.

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.

  • Stock, J., Watson, M. (2004). Combination forecasts of output growth in a seven-country data set. Journal of Forecasting, 23, 405–430.

    Article  Google Scholar 

  • Van der Vaart, A., Wellner, J. A. (1996). Weak convergence and empirical Processes: with applications to statistics. New York: Springer.

    Book  MATH  Google Scholar 

  • Wang, M., Zhang, X., Wan, A. T. K., you, K., Zou, G. (2021). Combination forecasts of output growth in a seven-country data set. Biometrics, 2021, 1–12.

    Google Scholar 

  • Wheelock, D. C., Wilson, P. W. (2008). Non-parametric, unconditional quantile estimation for efficiency analysis with an application to Federal Reserve check processing operations. Journal of Econometrics, 145, 209–225.

    Article  MathSciNet  MATH  Google Scholar 

  • Winnepenninckx, V., Lazar, V., Michiels, S., Dessen, P., Stas, M., Alonso, S. R., Avril, M., Romero, P. L., Robert, T., Balacescu, O., Eggermont, A. M., Lenoir, G., Sarasin, A., Tursz, T., Oord, J. J., Spatz, A. (2006). Gene expression profiling of primary cutaneous melanoma and clinical outcome. Journal of the National Cancer Institute, 98, 472–482.

    Article  Google Scholar 

  • Wu, M., Huang, J., Ma, S. (2017). Identifying gene-gene interactions using penalized tensor regression. Statistics in Medicine, 37, 598–610.

    Article  MathSciNet  Google Scholar 

  • Xu, Y., Wu, M., Ma, S., Ahmed, S. (2018). Robust gene environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88, 3502–3528.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, Y. (2001). Adaptive Regression by Mixing. Journal of the American Statistical Association, 96, 574–588.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, Y. (2007). Prediction/estimation with simple linear models: Is it really that simple? Econometric Theory, 23, 1–36.

    Article  MathSciNet  MATH  Google Scholar 

  • Ye, C., Yang, Y., Yang, Y. (2018). Sparsity oriented importance learning for high-dimensional linear regression. Journal of the American Statistical Association, 113, 1797–1812.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhan, Z., Yang, Y. (2022). Profile electoral college cross-validation Information Sciences, 586, 24–40.

    Google Scholar 

  • Zhu, R., Wan, A. T. K., Zhang, X., Zhou, G. (2019). A Mallows-type model averaging estimator for the varying coefficient partially linear model. Journal of the American Statistical Association, 114, 882–892.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Lin’s work was supported by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 19XNB014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cunjie Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 367 KB)

Appendix

Appendix

1.1 A.1. Proof of Theorem 1

We first introduce the following lemma, which is a direct result of Mack and Silverman (1982) and will be used in our proofs.

Lemma 1

Let \((\textbf{X}_{1},Y_{1}),... ,(\textbf{X}_{n},Y_{n})\) be i.i.d. random vectors, where \(Y_{1},... , Y_n\) are scalar random variables. Assume that \(E|Y|^{r}<\infty\) and \(\sup _\textbf{x}\int |y|^{r}f(\textbf{x},y)dy<\infty\), where f denotes the joint density of \((\textbf{X},Y)\). Let K be a bounded positive function with bounded support, satisfying a Lipschitz condition. Then,

$$\begin{aligned} \mathop {\sup }_{\textbf{x}\in D}\left| \frac{1}{n}\sum _{i=1}^n{K_h\left( \textbf{X}_{i}-\textbf{x}\right) Y_{i}-E\left[ K_h\left( \textbf{X}_{i}-\textbf{x}\right) Y_{i}\right] }\right| =O_{p}\left( \frac{{\log }^{1/2}(1/h)}{\sqrt{nh}}\right) , \end{aligned}$$

provided that \(n^{2\eta -1}h \rightarrow \infty\) for some \(\eta <1-r^{-1}\).

Now, we prove the results of Theorem 1. First, we introduce Knight’s identity (Knight 1988), which will be used in the following proof,

$$\begin{aligned} \rho _\tau (u+v)-\rho _\tau (u)=v\psi _\tau (u)+\int _0^{-v}\left[ I(u\le z)-I(u\le 0)\right] dz. \end{aligned}$$
(4)

For given u, \(\tau\), and j, define \(\varepsilon _{\tau ,i}=Y_i-Q_{\tau }({\textbf{X}}_i,U_i),\) \(r_{i(j)}=Q_{\tau }({\textbf {X}}_i,U_i)-{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}}^*_{(j)}\). Recall that \({{\varvec{\theta }}}_{(j)}^{*}=(a_{j}^{*},b^{*}_{j},{{\varvec{\beta }}}^{*\top }_{(j)})^{\top }\). To simplify the notation, we use a shorthand \(\varepsilon _i\) and \(\varepsilon\) for \(\varepsilon _{\tau ,i}\) and \(\varepsilon _{\tau }\), respectively. For \({{\varvec{\theta }}}_{(j)}\in \mathbb {R}^{p+1}\), we define

$$\begin{aligned} G_j(\tau , {\varvec{\theta }}_{(j)})=\sum _{i=1}^{n}\left[ \rho _{\tau }(Y_{i}-\textbf{X}_{i(j)}^\top {{\varvec{\theta }}}_{(j)})K_{h_{j}}(U_{i}-u)\right] . \end{aligned}$$

We will show that for any \(s>0\), there is a constant \(M>0\) such that for all n sufficiently large, we have

$$\begin{aligned} P\left\{ \inf _{\left\| \textbf{v}_{(j)}\right\| =M} G_j(\tau , {\varvec{\theta }}_{(j)}^{s})>G_j(\tau , {\varvec{\theta }}_{(j)}^{*})\right\} \ge 1-s, \end{aligned}$$
(5)

where \({\varvec{\theta }}_{(j)}^{s}={\varvec{\theta }}_{(j)}^{*}+\delta _s\textbf{v}_{(j)}\), \(\textbf{v}_{(j)}=(v_1,... ,v_{p+1})^{\top }\), and \(\delta _s=o(1)\). By the Knight’s identity, we obtain

$$\begin{aligned}{} & {} G_j(\tau , {\varvec{\theta }}_{(j)}^{s})-G_j(\tau , {\varvec{\theta }}_{(j)}^{*})\\{} & {} \quad =\sum _{i=1}^{n}\left[ \rho _{\tau }(Y_{i}-\textbf{X}_{i(j)}^\top {{\varvec{\theta }}}_{(j)}^{s})K_{h_{j}}(U_{i}-u)\right] -\sum _{i=1}^{n}\left[ \rho _{\tau }(Y_{i}-\textbf{X}_{i(j)}^\top {{\varvec{\theta }}}_{(j)}^{*})K_{h_{j}}(U_{i}-u)\right] \\{} & {} \quad =-\delta _s\sum _{i=1}^{n}K_{h_j}(U_i-u)\psi _{\tau }(\varepsilon _i+r_{i(j)})\textbf{X}_{i(j)}^{\top }\textbf{v}_{(j)}\\{} & {} \quad \,\,\,+\sum _{i=1}^{n}K_{h_j}(U_i-u)\int _0^{\delta _s\textbf{X}_{i(j)}^{\top }\textbf{v}_{(j)}}[I(\varepsilon _{i}\le -r_{i(j)}+z)-I(\varepsilon _{i}\le -r_{i(j)})]dz\\{} & {} \quad \equiv G_{j, 1}\left( \textbf{v}_{(j)}\right) +G_{j, 2}\left( \textbf{v}_{(j)}\right) +G_{j, 3}\left( \textbf{v}_{(j)}\right) , \end{aligned}$$

where

$$\begin{aligned}{} & {} G_{j, 1}\left( \textbf{v}_{(j)}\right) =-\delta _s \sum _{i=1}^{n}K_{h_j}(U_i-u) \psi _{\tau }\left( \varepsilon _{i}+r_{i(j)}\right) \textbf{X}_{i(j)}^{\top } \textbf{v}_{(j)}, \\{} & {} G_{j, 2}\left( \textbf{v}_{(j)}\right) =\sum _{i=1}^{n} E\left[ K_{h_j}(U_i-u)\int _{0}^{\delta _s \textbf{X}_{i(j)}^{\top } \textbf{v}_{(j)}} [I(\varepsilon _{i}\le -r_{i(j)}+z)-I(\varepsilon _{i}\le -r_{i(j)})]dz \Big | \textbf{X}_{i},U_i\right] , \\{} & {} G_{j, 3}\left( \textbf{v}_{(j)}\right) =\sum _{i=1}^{n}K_{h_j}(U_i-u)\int _{0}^{\delta _s \textbf{X}_{i(j)}^{\top } \textbf{v}_{(j)}}[I(\varepsilon _{i}\le -r_{i(j)}+z)-I(\varepsilon _{i}\le -r_{i(j)})]dz\\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,-\sum _{i=1}^nE\left[ K_{h_j}(U_i-u)\int _{0}^{\delta _s \textbf{X}_{i(j)}^{\top } \textbf{v}_{(j)}} [I(\varepsilon _{i}\le -r_{i(j)}+z)-I(\varepsilon _{i}\le -r_{i(j)})]dz \Big |\textbf{X}_{i},U_i\right] . \end{aligned}$$

By equation (2), we have \(E\left[ G_{j, 1}\left( \textbf{v}_{(j)}\right) \right] =0\). By Assumptions (A6) and (A9),

$$\begin{aligned} E\left( G_{j, 1}\left( \textbf{v}_{(j)}\right) \right) ^2\le C_K^2\delta _s^2\sum _{i=1}^{n}{} \textbf{v}_{(j)}^{\top }\textbf{B}_{(j)}(u)\textbf{v}_{(j)}\le C_K^2\overline{C}_{B_{(j)}}n\delta _s^2\Vert \textbf{v}_{(j)}\Vert ^2, \end{aligned}$$

where \(C_K\) is a finite positive constant. Hence \(G_{j, 1}\left( \textbf{v}_{(j)}\right) =O_p\left( \overline{C}_{B_{(j)}}^{1/2}\delta _s\sqrt{n}\right) \Vert \textbf{v}_{(j)})\Vert\). By Taylor expansion, we have

$$\begin{aligned} G_{j,2}\left( \textbf{v}_{(j)}\right)= & {} \frac{1}{2}\delta _s^2\textbf{v}_{(j)}^{\top }\sum _{i=1}^{n}K_{h_j}(U_i-u)[f(-r_{i(j)}|\textbf{X}_i,U_i) +o(1)]\textbf{X}_{i(j)}\textbf{X}_{i(j)}^{\top }\textbf{v}_{(j)}\\= & {} \frac{1}{2}\delta _s^2\textbf{v}_{(j)}^{\top }n\left[ f_U(u)\textbf{A}_{(j)}(u)+o(1)\right] \textbf{v}_{(j)}\\\ge & {} \frac{n\delta _s^2f_U(u)}{2}\underline{C}_{A_{(j)}}\Vert \textbf{v}_{(j)}\Vert ^2. \end{aligned}$$

Analogous to \(G_{j, 1}\left( \textbf{v}_{(j)}\right)\), by Assumption (A8), we obtain that \(G_{j, 3}\left( \textbf{v}_{(j)}\right) =O_p(\overline{C}_{A_{(j)}}^{1/2}\delta _s\sqrt{n})\Vert \textbf{v}_{(j)}\Vert\). Thus we get (5), which implies that with probability approaching to 1, there exists a local minimum \(\widehat{{\varvec{\theta }}}_{(j)}\) in the ball \(\mathcal {B}_{M,\delta _s}=\left\{ {\varvec{\theta }}_{(j)}^{*}+\delta _s\textbf{v}_{(j)}:\Vert \textbf{v}_{(j)}\Vert \le M\right\}\) such that \(\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}_{(j)}^{*}\Vert =O_p(\delta _s)=o_p(1)\). By the convexity of \(G_j(\tau ,{\varvec{\theta }}_{(j)})\), \(\widehat{{\varvec{\theta }}}_{(j)}\) is also the global minimum. Thus, Theorem 1 is proved.   \(\square\)

1.2 A.2. Proof of Theorem 2

For given \(\tau\) and j, recall that \(\varepsilon _{i}=Y_i-Q_{\tau }({\textbf{X}}_i,U_i),\) \(r_{i(j)}=Q_{\tau }({\textbf {X}}_i,U_i)-{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}}^*_{(j)}\). Let \(\widehat{{\varvec{\omega }}}_{j}=\sqrt{nh_{j}}\left( \widehat{a}_{j}-a_j^{*}, \widehat{{\varvec{\beta }}}^{\top }_{(j)}-{{\varvec{\beta }}}^{*\top }_{(j)},h_{j}(\widehat{b}_{j}-b_j^{*})\right) ^{\top }\). It follows from Theorem 1 in Cai and Xu (2008) that

$$\begin{aligned} \widehat{{\varvec{\omega }}}_{j}=-f_{U}^{-1}(u)\textbf{S}_{j}^{-1}(u)\mathbf{}\textbf{W}_{n,j}(u)+o_{p}(1), \end{aligned}$$

where

$$\begin{aligned}{} & {} \textbf{W}_{n,j}(u)=\frac{1}{\sqrt{nh_{j}}}\sum _{i=1}^nK_{h_{j}}(U_{i}-u) \psi _{\tau }(\varepsilon _i+r_{i(j)})\textbf{X}_{i(j)}^{\circ },\\{} & {} \textbf{S}_j(u)=E\left[ f(-r_{(j)}|{\textbf {X}},U){\textbf {X}}_{(j)}^{\circ }{\textbf {X}}_{(j)}^{\circ \top }|U=u\right] , \end{aligned}$$

where \({{\textbf {X}}}^{\circ }_{i(j)}=\left( X_{ij},X_{i1},... \ ,X_{i(j-1)},X_{i(j+1)},... \ ,X_{ip},(U_{i}-u)X_{ij}/h_{j}\right) ^{\top }\). So we have

$$\begin{aligned} \sqrt{nh_{j}}\left( \widehat{{\varvec{\theta }}}_{(-j)}-{{\varvec{\theta }}}_{(-j)}^{*}\right) =-f_{U}^{-1}(u)\textbf{C}_{(j)}^{-1}(u)\widetilde{\textbf{W}}_{n,j}(u)+o_{p}(1), \end{aligned}$$
(6)

where \(\widetilde{\textbf{W}}_{n,j}(u)=\frac{1}{\sqrt{nh_{j}}}\sum _{i=1}^{n}K_{h_{j}} (U_{i}-u)\psi _{\tau }(\varepsilon _i+r_{i(j)})\widetilde{\textbf{X}}_{i(j)}\), and \(\widetilde{{\textbf {X}}}_{i(j)}=(X_{ij},X_{i1},... ,X_{i(j-1)},X_{i(j+1)},... ,X_{ip})^\top\). Noting that \(E\left( \widetilde{\textbf{W}}_{n,j}(u)\right) =0\) by (2), and

$$\begin{aligned}{} & {} \textrm{Var}\left( \widetilde{\textbf{W}}_{n,j}(u)\right) =\frac{1}{nh_j}\sum _{i=1}^{n}E\left[ K_{h_j}^2(U_i-u) \psi _{\tau }^2(\varepsilon _i+r_{i(j)})\widetilde{{\textbf {X}}}_{i(j)}\widetilde{{\textbf {X}}}_{i(j)}^{\top }\right] \\{} & {} \quad =\frac{1}{n}\sum _{i=1}^{n}v_0f_U(u)E[\psi _{\tau }^2(\varepsilon +r_{(j)})\widetilde{{\textbf {X}}}_{(j)}\widetilde{{\textbf {X}}}_{(j)}^{\top }|U=u]+o(1)\\{} & {} \quad = v_0f_U(u)\textbf{D}_{(j)}(u). \end{aligned}$$

Then for any \(\epsilon >0\), define \(\eta _{i(j)}=1/\sqrt{nh_j}K_{h_j}(U_i-u)\psi _{\tau }(\varepsilon _i+r_{i(j)})\widetilde{{\textbf {X}}}_{i(j)}\), we have

$$\begin{aligned}{} & {} \sum _{i=1}^{n}E\left\{ \Vert \eta _{i(j)}\Vert ^2I\left[ \Vert \eta _{i(j)}\Vert \ge \epsilon \right] \right\} \\{} & {} \quad =nE\left\{ \Vert \eta _{i(j)}\Vert ^2I[\Vert \eta _{i(j)}\Vert \ge \epsilon ]\right\} \le n\left\{ E\Vert \eta _{i(j)}\Vert ^4\right\} ^{1/2}\{P(\Vert \eta _{i(j)}\Vert \ge \epsilon )\}^{1/2}\\{} & {} \quad \le n\epsilon ^{-2}E\Vert \eta _{i(j)}\Vert ^4. \end{aligned}$$

Furthermore, by Assumptions (A6) and (A10),

$$\begin{aligned} E\Vert \eta _{i(j)}\Vert ^4= & {} (nh_j)^{-2}E\left\{ K_{h_j}^4(U_i-u)\left[ \textrm{tr}\left( \psi _{\tau }^2(\varepsilon _i+r_{i(j)}) \widetilde{{\textbf {X}}}_{i(j)}\widetilde{{\textbf {X}}}_{i(j)}^{\top }\right) \right] ^2\right\} \\\le & {} (nh_j)^{-2}C_K^4E\left\{ \left[ \textrm{tr}\left( \widetilde{{\textbf {X}}}_{i(j)}\widetilde{{\textbf {X}}}_{i(j)}^{\top } \right) \right] ^2\right\} \\\le & {} (nh_j)^{-2}C_K^4E\left\| \widetilde{{\textbf {X}}}_{i(j)}\right\| ^4 =O\left( (nh_j)^{-2}\right) . \end{aligned}$$

Thus, \(\sum _{i=1}^{n}E\left\{ \Vert \eta _{i(j)}\Vert ^2I\left[ \Vert \eta _{i(j)}\Vert \ge \epsilon \right] \right\} =O\left( (nh_j^2)^{-1}\right) =o(1).\) According to the Lindeberg–Feller central limit theorem, we obtain

$$\begin{aligned} \widetilde{\textbf{W}}_{n,j}(u) {\mathop {\rightarrow }\limits ^{d}} N\left( 0, v_0f_U(u)\textbf{D}_{(j)}(u)\right) . \end{aligned}$$

By the Slusky’s theorem, we have

$$\begin{aligned} \sqrt{nh_j}\left( \widehat{{\varvec{\theta }}}_{(-j)}-{{\varvec{\theta }}}_{(-j)}^{*}\right) {\mathop {\rightarrow }\limits ^{d}} N\left( \textbf{0},\frac{v_{0}}{f_{U}(u)}{} \textbf{C}^{-1}_{(j)}(u)\textbf{D}_{(j)}(u)\textbf{C}^{-1}_{(j)}(u)\right) . \end{aligned}$$

Therefore, the proof of Theorem 2 is completed.   \(\square\)

1.3 A.3. Proof of Theorem 3

To show the results, it suffices to show that \(\sup _{\textbf{w}\in \mathcal {H}}\left| \frac{CV_{n_0}(\textbf{w})-QPE_{n}(\textbf{w})}{QPE_{n}(\textbf{w})}\right| =o_{p}(1).\) For notation simplicity, for a given \(\tau\), let \(\widehat{Q}_{j,n_0}(\cdot )=\widehat{Q}_{\tau ,n_0}^{(j)}(\cdot )\), \(\widehat{Q}_{j}(\cdot )=\widehat{Q}_{\tau }^{(j)}(\cdot )\). By the definition of \(CV_{n_0}({\textbf { w}})\) and \(QPE_{n}(\textbf{w})\), we have

$$\begin{aligned} \begin{aligned} CV_{n_0}(\textbf{w})-QPE_{n}(\textbf{w})&= \frac{1}{n-n_0}\sum _{i=n_0+1}^{n}\left[ \rho _\tau \left( Y_i-\sum _{j=1}^pw_j\widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i)\right) -\rho _\tau (\varepsilon _i)\right] \\&\,+\left[ E\rho _\tau (\varepsilon )-QPE_{n}(\textbf{w})\right] +\frac{1}{n-n_0}\sum _{i=n_0+1}^n\left[ \rho _\tau (\varepsilon _i)-E\rho _\tau (\varepsilon )\right] . \end{aligned} \end{aligned}$$

Noting that \(E\left[ \left( Q_{\tau }({\textbf {X}},U)-\sum _{j=1}^pw_j\widehat{Q}_j({\textbf {X}},U)\right) \psi _\tau (\varepsilon )\bigg |\mathcal {D}_n\right] =0\) and

$$\begin{aligned}{} & {} E\left[ \int _0^{\sum _{j=1}^pw_j\widehat{\scriptscriptstyle {Q}}_j(\scriptscriptstyle {\textbf{X}, U})-Q_{\tau }(\scriptscriptstyle {\textbf{X}, U})}[I(\varepsilon \le z)-I(\varepsilon \le 0)]dz\bigg |\mathcal {D}_n\right] \\{} & {} \quad = E_{\textbf{X}_i,U_i}\left\{ \int _0^{\sum _{j=1}^p w_j\widehat{\scriptscriptstyle {Q}}_j(\textbf{X}_i,U_i)-Q_{\tau }(\textbf{X}_i,U_i)}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\right\} , \end{aligned}$$

where \(E_{\textbf{X}_i,U_i}\) denote the expectation with respect to \(\{{\textbf {X}}_i,U_i\}\). Together with the Knight’s identity, we get the following decomposition expression

$$\begin{aligned} CV_{n_0}(\textbf{w})-QPE_{n}(\textbf{w})= CV_1({\textbf { w}})+CV_2({\textbf { w}})+CV_3({\textbf { w}})+CV_4({\textbf { w}})+CV_5, \end{aligned}$$

where

$$\begin{aligned}{} & {} CV_1({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\left[ (Q_{\tau }({\textbf {X}}_i,U_i)-\sum _{j=1}^pw_j \widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i))\psi _\tau (\varepsilon _i)\right] ,\\{} & {} CV_2({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\int _0^{\sum _{j=1}^pw_j\widehat{\scriptscriptstyle {Q}}_{j,n_0} (\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[I(\varepsilon _i\le z)-I(\varepsilon _i\le 0)]\\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,-[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz,\\{} & {} CV_3({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\Bigg [\int _0^{\sum _{j=1}^pw_j \widehat{\scriptscriptstyle {Q}}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\\{} & {} \,\,\,\,\,\,\,\,\,\,\,-E_{\scriptscriptstyle {\textbf{X}_i, U_i}}\left\{ \int _0^{\sum _{j=1}^p w_j\widehat{Q}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\right\} \Bigg ], \end{aligned}$$
$$\begin{aligned}{} & {} CV_4({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n \Bigg [ E_{\scriptscriptstyle {\textbf{X}_i, U_i}} \Bigg \{\int _0^{\sum _{j=1}^p w_j\widehat{\scriptscriptstyle {Q}}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)\\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,-F(0|{\textbf {X}}_i,U_i)]dz\Bigg \}\\{} & {} \,\,\,\,\,\,\,\,\,\,\,-E_{\scriptscriptstyle {\textbf{X}_i, U_i}}\left\{ \int _0^{\sum _{j=1}^p w_j\widehat{\scriptscriptstyle {Q}}_j (\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\right\} \Bigg ],\\{} & {} CV_5=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\left[ \rho _\tau (\varepsilon _i) -E\rho _\tau (\varepsilon )\right] . \end{aligned}$$

Next, we will show that

(i) \(\min _{\textbf{w}\in \mathcal {H}}QPE_{n}(\textbf{w})\ge E[\rho _{\tau }(\varepsilon )]-o_{p}(1)\);

(ii) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{1}(\textbf{w})|=o_{p}(1)\);

(iii) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{2}(\textbf{w})|=o_{p}(1)\);

(iv) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{3}(\textbf{w})|=o_{p}(1)\);

(v) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{4}(\textbf{w})|=o_{p}(1)\);

(vi) \(CV_{5}=o_{p}(1)\).

(i). Using (4) again, define \(Q_j^*({\textbf {X}},U)=\widetilde{{\textbf {X}}}_{(j)}^{\top }{{\varvec{\theta }}^*_{(-j)}}\), we get

$$\begin{aligned}{} & {} QPE_{n}(\textbf{w})-E[\rho _\tau (\varepsilon +r({\textbf { w}}))]\\{} & {} \quad = E\left\{ \rho _\tau \left( \varepsilon +r({\textbf { w}})+\sum _{j=1}^pw_j[Q_j^*({\textbf {X}},U)-\widehat{Q}_j({\textbf {X}},U)]\right) -\rho _\tau (\varepsilon +r({\textbf { w}}))\bigg |\mathcal {D}_n\right\} \\{} & {} \quad =E\left\{ \int _0^{\sum _{j=1}^pw_j[\widehat{\scriptscriptstyle {Q}}_j(\scriptscriptstyle {\textbf{X}, U})-Q_j^*(\scriptscriptstyle {\textbf{X}, U})]}[I(\varepsilon +r({\textbf { w}})\le z)-I(\varepsilon +r({\textbf { w}})\le 0)]dz\bigg |\mathcal {D}_n\right\} \\{} & {} \quad = E_{\scriptscriptstyle {\textbf{X}, U}}\left\{ \int _0^{\sum _{j=1}^pw_j[\widehat{\scriptscriptstyle {Q}}_j(\scriptscriptstyle {\textbf{X}, U})-Q_j^*(\scriptscriptstyle {\textbf{X}, U})]} [F(z-r({\textbf { w}})|{\textbf {X}},U)-F(-r({\textbf { w}})|{\textbf {X}},U)]dz\right\} , \end{aligned}$$

where \(E_{\textbf{X},U}\) denote the expectation with respect to \(\{{\textbf {X}},U\}\). By Taylor’s expansion and Jensen’s inequality, we have that

$$\begin{aligned}{} & {} \Big |QPE_{n}(\textbf{w})-E[\rho _\tau (\varepsilon +r({\textbf { w}}))]\Big |\\{} & {} \quad =\left| E_{\scriptscriptstyle {\textbf{X}, U}}\left\{ \int _0^{\sum _{j=1}^pw_j[\widehat{\scriptscriptstyle {Q}}_j(\scriptscriptstyle {\textbf{X}, U})-Q_j^*(\scriptscriptstyle {\textbf{X}, U})]} zf(-r({\textbf { w}})|{\textbf {X}},U)dz\right\} +o_p(1)\right| \\{} & {} \quad =\left| E_{\scriptscriptstyle {\textbf{X}, U}}\left\{ \frac{1}{2}f(-r({\textbf { w}})|{\textbf {X}},U)\left[ \sum _{j=1}^pw_j [\widehat{Q}_j({\textbf {X}},U)-Q_j^*({\textbf {X}},U)]\right] ^2\right\} +o_p(1)\right| \\{} & {} \quad \le \left| E_{\scriptscriptstyle {\textbf{X}, U}}\left\{ \frac{1}{2}f(-r({\textbf { w}})|{\textbf {X}},U)\sum _{j=1}^pw_j \left[ \widehat{Q}_j({\textbf {X}},U)-Q_j^*({\textbf {X}},U)\right] ^2\right\} +o_p(1)\right| \end{aligned}$$
$$\begin{aligned}{} & {} = \Bigg |\frac{1}{2} E_{\scriptscriptstyle {\textbf{X}, U}}\left\{ \sum _{j=1}^pw_j(\widehat{{\varvec{\theta }}}_{(-j)} -{{\varvec{\theta }}}_{(-j)}^*)^\top E\left[ f(-r({\textbf { w}})|{\textbf {X}},U)\widetilde{{\textbf {X}}}_{(j)}\widetilde{{\textbf {X}}}_{(j)}^ \top \right] (\widehat{{\varvec{\theta }}}_{(-j)}-{{\varvec{\theta }}}_{(-j)}^*)\right\} +o_p(1)\Bigg |\\{} & {} \le \left| \frac{1}{2}\overline{C}_J\max _{1\le j\le p}\Vert \widehat{{\varvec{\theta }}}_{(-j)}-{{\varvec{\theta }}}_{(-j)}^*\Vert +o_p(1)\right| , \end{aligned}$$

the last inequality follows from Assumption (A11). Now it follows from Theorem 1 that,

$$\begin{aligned} \Big |QPE_{n}(\textbf{w})-E[\rho _\tau (\varepsilon +r({\textbf { w}}))]\Big |\le o_p(1). \end{aligned}$$
(7)

Using the fact that \(D(t)=E[\rho _{\tau }(\varepsilon +t)-\rho _{\tau }(\varepsilon )]\) has a global minimum at \(t=0\), we have \(\min _{{\textbf { w}}\in \mathcal {H}}E\left[ \rho _\tau (\varepsilon +r({\textbf { w}}))\right] \ge E[\rho _\tau (\varepsilon )].\) By combining (7), we get

$$\begin{aligned} \min _{\textbf{w}\in \mathcal {H}}QPE_{n}(\textbf{w})) \ge \min _{\textbf{w}\in \mathcal {H}}E[\rho _\tau (\varepsilon +r({\textbf { w}}))]-o_p(1) \ge E[\rho _\tau (\varepsilon )]-o_p(1). \end{aligned}$$

(ii). Define \(Q_j^*({\textbf {X}}_i,U_i)=\widetilde{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}^*_{(-j)}}\). By simple calculation, we have the following decomposition,

$$\begin{aligned} \begin{aligned} CV_{1}(\textbf{w})&=\frac{1}{n-n_0}\sum _{i=n_0+1}^{n}\left[ Q_{\tau }({\textbf {X}}_i,U_i)-\sum _{j=1}^{p}w_{j}Q_j^*({\textbf {X}}_i,U_i)\right] \psi _{\tau }(\varepsilon _i)\\&\,\,\,\,-\frac{1}{n-n_0}\sum _{i=n_0+1}^{n}\sum _{j=1}^{p}w_{j}\left[ \widehat{Q}_{j,n_0} ({\textbf {X}}_i,U_i)-Q_j^*({\textbf {X}}_i,U_i)\right] \psi _{\tau }(\varepsilon _i)\\&=CV_{11}(\textbf{w})-CV_{12}(\textbf{w}). \end{aligned} \end{aligned}$$

It is easy to show that \(E(CV_{11}({\textbf { w}}))=0\) and \(\textrm{Var}(CV_{11}({\textbf { w}}))=O(1/(n-n_0)),\) which implies that \(CV_{11}({\textbf { w}})=o_p(1)\) for each \({\textbf { w}}\in \mathcal {H}\). To show the uniform convergence, we consider the function class \(\mathcal {F}=\{g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}}): {\textbf { w}}\in \mathcal {H}\},\) where \(g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}})=\left[ Q_{\tau }({\textbf {X}}_i,U_i)-\sum _{j=1}^{p}w_{j}Q_j^*({\textbf {X}}_i,U_i) \right] \psi _{\tau }(\varepsilon _i)\). On \(\mathcal {H}\), we define the metric \(|\cdot |_1\) as \(|{\textbf { w}}-\tilde{{\textbf { w}}}|_1=\sum _{j=1}^p|w_j-\tilde{w}_j|\), for any \({\textbf { w}}=(w_1,... ,w_p)\in \mathcal {H}\) and \(\tilde{{\textbf { w}}}=(\tilde{w}_1,... ,\tilde{w}_p)\in \mathcal {H}\). Then, the \(\epsilon\)-covering number of \(\mathcal {H}\) with respect to \(|\cdot |_1\) is \(\mathcal {N}(\epsilon ,\mathcal {H},|\cdot |_1)=O(1/\epsilon ^{p-1})\). Further,

$$\begin{aligned} |g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}})-g(\varepsilon _i,{\textbf {X}}_i,U_i;\tilde{{\textbf { w}}})|= & {} \left| \sum _{j=1}^p(w_j-\tilde{w}_j)Q_j^*({\textbf {X}}_i,U_i)\psi _\tau (\varepsilon _i)\right| \\\le & {} C_\theta |{\textbf { w}}-\tilde{{\textbf { w}}}|_1\max _{1\le j\le p}\Vert \widetilde{{\textbf {X}}}_{i(j)}\Vert , \end{aligned}$$

where \(C_\theta =p\max _{1\le j\le p}\Vert {{\varvec{\theta }}}_{(-j)}^*\Vert =O(p^{3/2})\) and \(E\max _{1\le j\le p}\Vert \widetilde{{\textbf {X}}}_{i(j)}\Vert <\infty\) by Assumption (A3). For a fix p, this yields that the \(\epsilon\)-bracketing number of \(\mathcal {F}\) with respect to the \(L_1\)-norm is \(\mathcal {N}_{[]}(\epsilon ,\mathcal {F},L_1(P))\le C/\epsilon ^{p-1}\) for some constant C. By Theorem 2.4.1 of Van der Vaart and Wellner (1996), we conclude that \(\mathcal {F}\) is Glivenko–Cantelli. And it follows from Glivenko–Cantelli theorem that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{11}({\textbf { w}})|=o_p(1)\). By the Cauchy–Schwarz inequality,

$$\begin{aligned}{} & {} \mathop {\sup }_{\textbf{w}\in \mathcal {H}}|CV_{12}(\textbf{w})| \triangleq \mathop {\sup }_{\textbf{w}\in \mathcal {H}}\left| \frac{1}{n-n_0} \sum _{i=n_0+1}^{n}\sum _{j=1}^{p}w_{j}\left[ \widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i) -Q_j^*({\textbf {X}}_i,U_i)\right] \psi _{\tau }(\varepsilon _i)\right| \\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\le \mathop {\sup }_{\textbf{w}\in \mathcal {H}}\sum _{j=1}^{p}w_{j}\frac{1}{n-n_0}\sum _{i=n_0+1}^{n} \left| \left[ \widetilde{{\textbf {X}}}_{i(j)}^\top (\widehat{{\varvec{\theta }}}_{(-j),n_0} -{{\varvec{\theta }}}_{(-j)}^*)\right] \psi _{\tau }(\varepsilon _i)\right| \\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\le \sum _{j=1}^p\frac{1}{n-n_0}\sum _{i=n_0+1}^n \left| \widetilde{{\textbf {X}}}_{i(j)}^\top (\widehat{{\varvec{\theta }}}_{(-j),n_0} -{{\varvec{\theta }}}_{(-j)}^*)\right| \\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\le \sum _{j=1}^p\max _{n_0+1\le i\le n}\Vert \widetilde{{\textbf {X}}}_{i(j)}\Vert \Vert \widehat{{\varvec{\theta }}}_{(-j),n_0}-{{\varvec{\theta }}}_{(-j)}^*\Vert , \end{aligned}$$

where \(\widehat{{\varvec{\theta }}}_{(-j),n_0}=(\widehat{a}_{j,n_0},\widehat{{\varvec{\beta }}}_{(j),n_0}^{\top })^{\top }\), then by Theorem 1 and Assumption (A3), we get \(\mathop {\sup }_{\textbf{w}\in \mathcal {H}}|CV_{12}(\textbf{w})|=o_p(1).\)

(iii) To prove (iii), we rewrite \(CV_2({\textbf { w}})=CV_{21}({\textbf { w}})+CV_{22}({\textbf { w}})\), where

$$\begin{aligned}{} & {} CV_{21}({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\int _0^{\sum _{j=1}^pw_j{\scriptscriptstyle {Q}}_j^*(\scriptscriptstyle {\textbf{X}_i, U_i}) -Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})} [I(\varepsilon _i\le z)-I(\varepsilon _i\le 0)\\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,-F(z|{\textbf {X}}_i,U_i)+F(0|{\textbf {X}}_i,U_i)]dz,\\{} & {} CV_{22}({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\int _{\sum _{j=1}^pw_j{\scriptscriptstyle {Q}}_j^*(\scriptscriptstyle {\textbf{X}_i, U_i}) -Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}^{\sum _{j=1}^pw_j\widehat{\scriptscriptstyle {Q}}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})} [I(\varepsilon _i\le z)-I(\varepsilon _i\le 0)\\{} & {} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,-F(z|{\textbf {X}}_i,U_i)+F(0|{\textbf {X}}_i,U_i)]dz. \end{aligned}$$

Noting that \(E[CV_{21}({\textbf { w}})]=0\), \(\textrm{Var}(CV_{21}({\textbf { w}}))=O(1/n-n_0)\). Analogous to the proof of \(CV_{11}({\textbf { w}})\), we can show that \(\sup _{w\in \mathcal {H}}|CV_{21}({\textbf { w}})|=o_p(1)\). On the other hand,

$$\begin{aligned} |CV_{22}({\textbf { w}})| \le \frac{2}{n-n_0}\sum _{i=n_0+1}^n\sum _{j=1}^p\left| w_j\left[ \widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i)-Q_j^*({\textbf {X}}_i,U_i) \right] \right| , \end{aligned}$$

similar to \(CV_{12}({\textbf { w}})\), we have \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{12}({\textbf { w}})|=o_p(1).\)

(iv) We also decompose \(CV_3({\textbf { w}})=CV_{31}({\textbf { w}})+CV_{32}({\textbf { w}})\) with

$$\begin{aligned}{} & {} CV_{31}({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\Bigg \{\int _0^{\sum _{j=1}^pw_j{\scriptscriptstyle {Q}}_j^*(\scriptscriptstyle {\textbf{X}_i, U_i}) -Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\\{} & {} \,\,\,\,\,\,\,\,\,\,\,-E_{\scriptscriptstyle {\textbf{X}_i, U_i}}\left[ \int _0^{\sum _{j=1}^pw_j{\scriptscriptstyle {Q}}_j^*(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau } (\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\right] \Bigg \},\\ \end{aligned}$$
$$\begin{aligned}{} & {} CV_{32}({\textbf { w}})=\frac{1}{n-n_0}\sum _{i=n_0+1}^n\Bigg \{\int _{\sum _{j=1}^pw_jQ_j^*(\scriptscriptstyle {\textbf{X}_i, U_i}) -Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}^{\sum _{j=1}^pw_j\widehat{\scriptscriptstyle {Q}}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i)-F(0|{\textbf {X}}_i,U_i)]dz\\{} & {} \,\,\,\,\,\,\,\,\,\,\,-E_{\scriptscriptstyle {\textbf{X}_i, U_i}}\left[ \int _{\sum _{j=1}^pw_j{\scriptscriptstyle {Q}}_j^*(\scriptscriptstyle {\textbf{X}_i, U_i}) -Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}^{\sum _{j=1}^pw_j\widehat{\scriptscriptstyle {Q}}_{j,n_0}(\scriptscriptstyle {\textbf{X}_i, U_i})-Q_{\tau }(\scriptscriptstyle {\textbf{X}_i, U_i})}[F(z|{\textbf {X}}_i,U_i) -F(0|{\textbf {X}}_i,U_i)]dz\right] \Bigg \}. \end{aligned}$$

Similar to the proof of \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{11}({\textbf { w}})|=o_p(1),\) we can show that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{31}({\textbf { w}})|=o_p(1)\), the details are omitted here.

Noting that

$$\begin{aligned} |CV_{32}({\textbf { w}})|\le & {} \frac{1}{n-n_0}\sum _{i=n_0+1}^n\sum _{j=1}^pw_j\left| \widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i)-Q_j^*({\textbf {X}}_i,U_i)\right| \\{} & {} \,\,+\frac{1}{n-n_0}\sum _{i=n_0+1}^n\sum _{j=1}^pw_jE_{\scriptscriptstyle {\textbf{X}_i, U_i}} \left| \widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i)-Q_j^*({\textbf {X}}_i,U_i)\right| \\= & {} CV_{321}({\textbf { w}})+ CV_{322}({\textbf { w}}). \end{aligned}$$

We can prove that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{321}({\textbf { w}})|=o_p(1)\) as shown in \(CV_{22}({\textbf { w}})\). Furthermore, by Cauchy–Schwarz inequality, we have

$$\begin{aligned}{} & {} \sup _{{\textbf { w}}\in \mathcal {H}}|CV_{322}({\textbf { w}})|\\{} & {} \quad \le \sup _{{\textbf { w}}\in \mathcal {H}}\frac{1}{n-n_0}\sum _{i=n_0+1}^n\sum _{j=1}^pw_j \left\{ (\widehat{{\varvec{\theta }}}_{(-j),n_0}-{{\varvec{\theta }}}_{(-j)}^*)^\top E[\widetilde{{\textbf {X}}}_{i(j)} \widetilde{{\textbf {X}}}_{i(j)}^\top ](\widehat{{\varvec{\theta }}}_{(-j),n_0}-{{\varvec{\theta }}}_{(-j)}^*)\right\} ^{1/2}\\{} & {} \quad \le \max _{n_0+1\le i\le n}\max _{1\le j\le p}\lambda _{\max }^{1/2}E[\widetilde{{\textbf {X}}}_{i(j)}\widetilde{{\textbf {X}}}_{i(j)}^\top ]\max _{1\le j\le p}\Vert \widehat{{\varvec{\theta }}}_{(-j),n_0}-{{\varvec{\theta }}}_{(-j)}^*\Vert . \end{aligned}$$

By Assumption (A8) and Theorem 1, we have \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{322}({\textbf { w}})|=o_p(1).\)

(v) To prove (v), we note that

$$\begin{aligned} \sup _{{\textbf { w}}\in \mathcal {H}}|CV_4({\textbf { w}})|\le & {} \frac{1}{n-n_0}\sum _{i=n_0+1}^nE_{\scriptscriptstyle {\textbf{X}_i, U_i}} \left| \sum _{j=1}^pw_j\widehat{Q}_{j,n_0}({\textbf {X}}_i,U_i)-\sum _{j=1}^pw_j\widehat{Q}_j({\textbf {X}}_i,U_i)\right| \\\le & {} \frac{1}{n-n_0}\sum _{i=n_0+1}^n\sum _{j=1}^pw_jE_{\scriptscriptstyle {\textbf{X}_i, U_i}}\left| \widehat{Q}_{j,n_0} ({\textbf {X}}_i,U_i)-\widehat{Q}_j({\textbf {X}}_i,U_i)\right| , \end{aligned}$$

following the proof of \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{322}({\textbf { w}})|=o_p(1)\), we obtain (v).

(vi) \(CV_5=o_p(1)\) follows from the weak law of large numbers.

Finally, we complete the proof of Theorem 3. \(\square\)

1.4 A.4. Proof of Theorem 4

According to the proof of Theorem 1, we can further obtain that \(\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) uniformly for all \(\tau \in \mathcal {T}\). That is to say, we have \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\).

In the following, we prove that \(\widehat{\textbf{w}}\) is asymptotically optimal uniformly for \(\tau \in \mathcal {T}\). The proof is analogous to the proof of Theorem 3, but is more challenge due to the requirement of the asymptotic optimality of \(\widehat{\textbf{w}}\) to hold uniformly in the set of quantile indices. Specifically, we need to prove (i)-(vi) hold but with \(\textbf{w}\in \mathcal {H}\) replaced by \((\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}\) in Theorem 3.

(a) According to the proof of (i) in Theorem 3, it is easy to obtain that

$$\begin{aligned} \min _{\textbf{w}\in \mathcal {H}}QPE_{\tau ,n}(\textbf{w}) \ge \min _{\textbf{w}\in \mathcal {H}}E[\rho _\tau (\varepsilon +r({\textbf { w}}))]-o_p(1) \ge E[\rho _\tau (\varepsilon )]-o_p(1), \text {for all} \,\tau \in \mathcal {T}, \end{aligned}$$

which is to say that \(\mathop {\inf }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}QPE_{\tau ,n}(\textbf{w})\ge E[\rho _{\tau }(\varepsilon )]-o_{p}(1)\).

(b) We have \(CV_1(\textbf{w})=CV_{11}(\textbf{w})-CV_{12}(\textbf{w})\). To prove (b), it suffices to show that \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{11}(\textbf{w})|=o_p(1)\) and \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{12}(\textbf{w})|=o_p(1)\). Similar to the proof of (ii) in Theorem 2, to show the uniform convergence, we consider the class of functions \(\mathcal {G}=\{g(\varepsilon _i,\textbf{X}_i,U_i;\textbf{w},\tau ):(\textbf{w},\tau )\in \mathcal {H}\times \mathcal {T}\},\) where \(g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}},\tau )=\left[ Q_{\tau }({\textbf {X}}_i,U_i) -\sum _{j=1}^{p}w_{j}Q_j^*({\textbf {X}}_i,U_i)\right] \psi _{\tau }(\varepsilon _i)\). On \(\mathcal {H}\times \mathcal {T}\), we define the metric \(|\cdot |_1^t\) as \(|({\textbf { w}},t)-(\tilde{{\textbf { w}}},1-t)|_1^{t}=\sum _{j=1}^p|w_j-\tilde{w}_j|+1-2t\). Then, the \(\epsilon\)-covering number \(\mathcal {N}(\epsilon ,\mathcal {H}\times \mathcal {T},|\cdot |_1^{t})=O(1/\epsilon ^{p-1})\). Further, the \(\epsilon\)-bracketing number \(\mathcal {N}_{[]}(\epsilon ,\mathcal {G},L_1(P))\le C/\epsilon ^{p-1}\), and it follows from Glivenko–Cantelli theorem that \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{11}(\textbf{w})|=o_p(1)\).

We also have

$$\begin{aligned} \mathop {\sup }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{12}(\textbf{w})|\le \sum _{j=1}^p\max _{n_0+1\le i\le n}\Vert \widetilde{{\textbf {X}}}_{i(j)}\Vert \mathop {\sup }\limits _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(-j),n_0}-{{\varvec{\theta }}}_{(-j)}^*\Vert =o_p(1). \end{aligned}$$

Hence

$$\begin{aligned} \mathop {\sup }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{1}(\textbf{w})|\le \mathop {\sup }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{11}(\textbf{w})|+\mathop {\sup }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{12}(\textbf{w})|=o_{p}(1). \end{aligned}$$

Similarly, equations (iii), (iv), (v) and (vi) follow from the corresponding proof in Theorem 3 as well as the fact \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\). Therefore, we complete the proof of Theorem 4. \(\square\)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhan, Z., Li, Y., Yang, Y. et al. Model averaging for semiparametric varying coefficient quantile regression models. Ann Inst Stat Math 75, 649–681 (2023). https://doi.org/10.1007/s10463-022-00857-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-022-00857-z

Keywords

Navigation