Abstract
In this study, we propose a model averaging approach to estimating the conditional quantiles based on a set of semiparametric varying coefficient models. Different from existing literature on the subject, we consider a particular form for all candidates, where there is only one varying coefficient in each sub-model, and all the candidates under investigation may be misspecified. We propose a weight choice criterion based on a leave-more-out cross-validation objective function. Moreover, the resulting averaging estimator is more robust against model misspecification due to the weighted coefficients that adjust the relative importance of the varying and constant coefficients for the same predictors. We prove out statistical properties for each sub-model and asymptotic optimality of the weight selection method. Simulation studies show that the proposed procedure has satisfactory prediction accuracy. An analysis of a skin cutaneous melanoma data further supports the merits of the proposed approach.
Similar content being viewed by others
References
Angrist, J., Chernozhukov, V., Fernández-Val, I. (2006). Quantile regression under misspecification, with an application to the U.S. wage structure. Econometrica, 74, 539–563.
Cai, Z., Xiao, Z. (2012). Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. Journal of Econometrics, 167, 413–425.
Cai, Z., Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association, 103, 1595–1608.
Cai, Z., Chen, L., Fang, Y. (2018). A semiparametric quantile panel data model with an application to estimating the growth effect of FDI. Journal of Econometrics, 206, 531–553.
Chai, H., Shi, X., Zhang, Q., Zhao, Q., Huang, Y., Ma, S. (2017). Analysis of cancer gene expression data with an assisted robust marker identification approach. Genetic Epidemiology, 41, 779–789.
Fitzenberger, B., Koenker, R., Machado, J. (Eds.). (2002). Economic application of quantile regression. Heidelberg, Germany: Physica Verlag.
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75, 1175–1189.
Hjort, N. L., Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.
Kai, B., Li, R., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332.
Knight, K. (1998). Limiting distributions for \(L_1\) regression estimators under general conditions. Annals of Statistics, 26, 755–770.
Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Kuester, K., Mittnik, S., Paolella, M. (2006). Value-at-risk prediction: A comparison of alternative strategies. Journal of Financial Econometrics, 4, 53–89.
Li, D., Linton, O., Lu, Z. (2015). A flexible semiparametric forecasting model for time series. Journal of Econometrics, 187, 345–357.
Li, G., Li, Y., Tsai, C. L. (2015). Quantile correlations and quantile autoregressive modeling. Journal of the American Statistical Association, 110, 246–261.
Li, J., Xia, X., Wong, W. K., Nott, D. (2018). Varying-coefficient semiparametric model averaging prediction. Biometrics, 74, 1417–1426.
Li, X., Ma, X., Zhang, J. (2018). Conditional quantile correlation screening procedure for ultrahigh-dimensional varying coefficient models. Journal of Statistical Planning and Inference, 197, 62–92.
Li, Y., Graubard, B. I., Korn, E. L. (2010). Application of nonparametric quantile regression to body mass index percentile curves from survey data. Statistics in Medicine, 29, 558–572.
Lian, H. (2015). Quantile regression for dynamic partially linear varying coefficient time series models. Journal of Multivariate Analysis, 141, 49–66.
Lin, H., Fei, Z., Li, Y. (2016). A semiparametrically efficient estimator of the time-varying effects for survival data with time-dependent treatment. Scandinavian Journal of Statistics, 43, 649–663.
Liu, J., Huang, J., Zhang, Y., Lan, Q., Rothman, N., Zheng, T., Ma, S. (2013). Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102, 189–194.
Lu, X., Su, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics, 188, 40–58.
Ma, S., Yang, L., Romero, R., Cui, Y. (2011). Varying coefficient model for gene-environment interaction: A non-linear look. Bioinformatics, 27, 2119–2126.
Mack, Y., Silverman, B. (1982). Weak and strong uniform consistency of kernel regression estimates. Probability Theory Related Fields, 61, 405–415.
Nan, Y., Yang, Y. (2014). Variable selection diagnostics measures for high-dimensional regression. Journal of Computational and Graphical Statistics, 23, 636–656.
Shan, K., Yang, Y. (2009). Combining Regression Quantile Estimators. Statistica Sinica, 19, 1171–1191.
Sharafeldin, N., Slattery, M. L., Liu, Q., Franco-Villalobos, C., Caan, B. J., Potter, J. D., Yasui, Y. (2015). A candidate-pathway approach to identify gene-environment interactions: Analyses of colon cancer risk and survival. Journal of the National Cancer Institute, 107(9), djv160.
Shen, Y., Liang, H. (2017). Quantile regression for partially linear varying-coefficient model with censoring indicators missing at random. Computational Statistics and Data Analysis, 117, 1–18.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
Stock, J., Watson, M. (2004). Combination forecasts of output growth in a seven-country data set. Journal of Forecasting, 23, 405–430.
Van der Vaart, A., Wellner, J. A. (1996). Weak convergence and empirical Processes: with applications to statistics. New York: Springer.
Wang, M., Zhang, X., Wan, A. T. K., you, K., Zou, G. (2021). Combination forecasts of output growth in a seven-country data set. Biometrics, 2021, 1–12.
Wheelock, D. C., Wilson, P. W. (2008). Non-parametric, unconditional quantile estimation for efficiency analysis with an application to Federal Reserve check processing operations. Journal of Econometrics, 145, 209–225.
Winnepenninckx, V., Lazar, V., Michiels, S., Dessen, P., Stas, M., Alonso, S. R., Avril, M., Romero, P. L., Robert, T., Balacescu, O., Eggermont, A. M., Lenoir, G., Sarasin, A., Tursz, T., Oord, J. J., Spatz, A. (2006). Gene expression profiling of primary cutaneous melanoma and clinical outcome. Journal of the National Cancer Institute, 98, 472–482.
Wu, M., Huang, J., Ma, S. (2017). Identifying gene-gene interactions using penalized tensor regression. Statistics in Medicine, 37, 598–610.
Xu, Y., Wu, M., Ma, S., Ahmed, S. (2018). Robust gene environment interaction analysis using penalized trimmed regression. Journal of Statistical Computation and Simulation, 88, 3502–3528.
Yang, Y. (2001). Adaptive Regression by Mixing. Journal of the American Statistical Association, 96, 574–588.
Yang, Y. (2007). Prediction/estimation with simple linear models: Is it really that simple? Econometric Theory, 23, 1–36.
Ye, C., Yang, Y., Yang, Y. (2018). Sparsity oriented importance learning for high-dimensional linear regression. Journal of the American Statistical Association, 113, 1797–1812.
Zhan, Z., Yang, Y. (2022). Profile electoral college cross-validation Information Sciences, 586, 24–40.
Zhu, R., Wan, A. T. K., Zhang, X., Zhou, G. (2019). A Mallows-type model averaging estimator for the varying coefficient partially linear model. Journal of the American Statistical Association, 114, 882–892.
Acknowledgements
Lin’s work was supported by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 19XNB014).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 A.1. Proof of Theorem 1
We first introduce the following lemma, which is a direct result of Mack and Silverman (1982) and will be used in our proofs.
Lemma 1
Let \((\textbf{X}_{1},Y_{1}),... ,(\textbf{X}_{n},Y_{n})\) be i.i.d. random vectors, where \(Y_{1},... , Y_n\) are scalar random variables. Assume that \(E|Y|^{r}<\infty\) and \(\sup _\textbf{x}\int |y|^{r}f(\textbf{x},y)dy<\infty\), where f denotes the joint density of \((\textbf{X},Y)\). Let K be a bounded positive function with bounded support, satisfying a Lipschitz condition. Then,
provided that \(n^{2\eta -1}h \rightarrow \infty\) for some \(\eta <1-r^{-1}\).
Now, we prove the results of Theorem 1. First, we introduce Knight’s identity (Knight 1988), which will be used in the following proof,
For given u, \(\tau\), and j, define \(\varepsilon _{\tau ,i}=Y_i-Q_{\tau }({\textbf{X}}_i,U_i),\) \(r_{i(j)}=Q_{\tau }({\textbf {X}}_i,U_i)-{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}}^*_{(j)}\). Recall that \({{\varvec{\theta }}}_{(j)}^{*}=(a_{j}^{*},b^{*}_{j},{{\varvec{\beta }}}^{*\top }_{(j)})^{\top }\). To simplify the notation, we use a shorthand \(\varepsilon _i\) and \(\varepsilon\) for \(\varepsilon _{\tau ,i}\) and \(\varepsilon _{\tau }\), respectively. For \({{\varvec{\theta }}}_{(j)}\in \mathbb {R}^{p+1}\), we define
We will show that for any \(s>0\), there is a constant \(M>0\) such that for all n sufficiently large, we have
where \({\varvec{\theta }}_{(j)}^{s}={\varvec{\theta }}_{(j)}^{*}+\delta _s\textbf{v}_{(j)}\), \(\textbf{v}_{(j)}=(v_1,... ,v_{p+1})^{\top }\), and \(\delta _s=o(1)\). By the Knight’s identity, we obtain
where
By equation (2), we have \(E\left[ G_{j, 1}\left( \textbf{v}_{(j)}\right) \right] =0\). By Assumptions (A6) and (A9),
where \(C_K\) is a finite positive constant. Hence \(G_{j, 1}\left( \textbf{v}_{(j)}\right) =O_p\left( \overline{C}_{B_{(j)}}^{1/2}\delta _s\sqrt{n}\right) \Vert \textbf{v}_{(j)})\Vert\). By Taylor expansion, we have
Analogous to \(G_{j, 1}\left( \textbf{v}_{(j)}\right)\), by Assumption (A8), we obtain that \(G_{j, 3}\left( \textbf{v}_{(j)}\right) =O_p(\overline{C}_{A_{(j)}}^{1/2}\delta _s\sqrt{n})\Vert \textbf{v}_{(j)}\Vert\). Thus we get (5), which implies that with probability approaching to 1, there exists a local minimum \(\widehat{{\varvec{\theta }}}_{(j)}\) in the ball \(\mathcal {B}_{M,\delta _s}=\left\{ {\varvec{\theta }}_{(j)}^{*}+\delta _s\textbf{v}_{(j)}:\Vert \textbf{v}_{(j)}\Vert \le M\right\}\) such that \(\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}_{(j)}^{*}\Vert =O_p(\delta _s)=o_p(1)\). By the convexity of \(G_j(\tau ,{\varvec{\theta }}_{(j)})\), \(\widehat{{\varvec{\theta }}}_{(j)}\) is also the global minimum. Thus, Theorem 1 is proved. \(\square\)
1.2 A.2. Proof of Theorem 2
For given \(\tau\) and j, recall that \(\varepsilon _{i}=Y_i-Q_{\tau }({\textbf{X}}_i,U_i),\) \(r_{i(j)}=Q_{\tau }({\textbf {X}}_i,U_i)-{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}}^*_{(j)}\). Let \(\widehat{{\varvec{\omega }}}_{j}=\sqrt{nh_{j}}\left( \widehat{a}_{j}-a_j^{*}, \widehat{{\varvec{\beta }}}^{\top }_{(j)}-{{\varvec{\beta }}}^{*\top }_{(j)},h_{j}(\widehat{b}_{j}-b_j^{*})\right) ^{\top }\). It follows from Theorem 1 in Cai and Xu (2008) that
where
where \({{\textbf {X}}}^{\circ }_{i(j)}=\left( X_{ij},X_{i1},... \ ,X_{i(j-1)},X_{i(j+1)},... \ ,X_{ip},(U_{i}-u)X_{ij}/h_{j}\right) ^{\top }\). So we have
where \(\widetilde{\textbf{W}}_{n,j}(u)=\frac{1}{\sqrt{nh_{j}}}\sum _{i=1}^{n}K_{h_{j}} (U_{i}-u)\psi _{\tau }(\varepsilon _i+r_{i(j)})\widetilde{\textbf{X}}_{i(j)}\), and \(\widetilde{{\textbf {X}}}_{i(j)}=(X_{ij},X_{i1},... ,X_{i(j-1)},X_{i(j+1)},... ,X_{ip})^\top\). Noting that \(E\left( \widetilde{\textbf{W}}_{n,j}(u)\right) =0\) by (2), and
Then for any \(\epsilon >0\), define \(\eta _{i(j)}=1/\sqrt{nh_j}K_{h_j}(U_i-u)\psi _{\tau }(\varepsilon _i+r_{i(j)})\widetilde{{\textbf {X}}}_{i(j)}\), we have
Furthermore, by Assumptions (A6) and (A10),
Thus, \(\sum _{i=1}^{n}E\left\{ \Vert \eta _{i(j)}\Vert ^2I\left[ \Vert \eta _{i(j)}\Vert \ge \epsilon \right] \right\} =O\left( (nh_j^2)^{-1}\right) =o(1).\) According to the Lindeberg–Feller central limit theorem, we obtain
By the Slusky’s theorem, we have
Therefore, the proof of Theorem 2 is completed. \(\square\)
1.3 A.3. Proof of Theorem 3
To show the results, it suffices to show that \(\sup _{\textbf{w}\in \mathcal {H}}\left| \frac{CV_{n_0}(\textbf{w})-QPE_{n}(\textbf{w})}{QPE_{n}(\textbf{w})}\right| =o_{p}(1).\) For notation simplicity, for a given \(\tau\), let \(\widehat{Q}_{j,n_0}(\cdot )=\widehat{Q}_{\tau ,n_0}^{(j)}(\cdot )\), \(\widehat{Q}_{j}(\cdot )=\widehat{Q}_{\tau }^{(j)}(\cdot )\). By the definition of \(CV_{n_0}({\textbf { w}})\) and \(QPE_{n}(\textbf{w})\), we have
Noting that \(E\left[ \left( Q_{\tau }({\textbf {X}},U)-\sum _{j=1}^pw_j\widehat{Q}_j({\textbf {X}},U)\right) \psi _\tau (\varepsilon )\bigg |\mathcal {D}_n\right] =0\) and
where \(E_{\textbf{X}_i,U_i}\) denote the expectation with respect to \(\{{\textbf {X}}_i,U_i\}\). Together with the Knight’s identity, we get the following decomposition expression
where
Next, we will show that
(i) \(\min _{\textbf{w}\in \mathcal {H}}QPE_{n}(\textbf{w})\ge E[\rho _{\tau }(\varepsilon )]-o_{p}(1)\);
(ii) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{1}(\textbf{w})|=o_{p}(1)\);
(iii) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{2}(\textbf{w})|=o_{p}(1)\);
(iv) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{3}(\textbf{w})|=o_{p}(1)\);
(v) \(\sup _{\textbf{w}\in \mathcal {H}}|CV_{4}(\textbf{w})|=o_{p}(1)\);
(vi) \(CV_{5}=o_{p}(1)\).
(i). Using (4) again, define \(Q_j^*({\textbf {X}},U)=\widetilde{{\textbf {X}}}_{(j)}^{\top }{{\varvec{\theta }}^*_{(-j)}}\), we get
where \(E_{\textbf{X},U}\) denote the expectation with respect to \(\{{\textbf {X}},U\}\). By Taylor’s expansion and Jensen’s inequality, we have that
the last inequality follows from Assumption (A11). Now it follows from Theorem 1 that,
Using the fact that \(D(t)=E[\rho _{\tau }(\varepsilon +t)-\rho _{\tau }(\varepsilon )]\) has a global minimum at \(t=0\), we have \(\min _{{\textbf { w}}\in \mathcal {H}}E\left[ \rho _\tau (\varepsilon +r({\textbf { w}}))\right] \ge E[\rho _\tau (\varepsilon )].\) By combining (7), we get
(ii). Define \(Q_j^*({\textbf {X}}_i,U_i)=\widetilde{{\textbf {X}}}_{i(j)}^{\top }{{\varvec{\theta }}^*_{(-j)}}\). By simple calculation, we have the following decomposition,
It is easy to show that \(E(CV_{11}({\textbf { w}}))=0\) and \(\textrm{Var}(CV_{11}({\textbf { w}}))=O(1/(n-n_0)),\) which implies that \(CV_{11}({\textbf { w}})=o_p(1)\) for each \({\textbf { w}}\in \mathcal {H}\). To show the uniform convergence, we consider the function class \(\mathcal {F}=\{g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}}): {\textbf { w}}\in \mathcal {H}\},\) where \(g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}})=\left[ Q_{\tau }({\textbf {X}}_i,U_i)-\sum _{j=1}^{p}w_{j}Q_j^*({\textbf {X}}_i,U_i) \right] \psi _{\tau }(\varepsilon _i)\). On \(\mathcal {H}\), we define the metric \(|\cdot |_1\) as \(|{\textbf { w}}-\tilde{{\textbf { w}}}|_1=\sum _{j=1}^p|w_j-\tilde{w}_j|\), for any \({\textbf { w}}=(w_1,... ,w_p)\in \mathcal {H}\) and \(\tilde{{\textbf { w}}}=(\tilde{w}_1,... ,\tilde{w}_p)\in \mathcal {H}\). Then, the \(\epsilon\)-covering number of \(\mathcal {H}\) with respect to \(|\cdot |_1\) is \(\mathcal {N}(\epsilon ,\mathcal {H},|\cdot |_1)=O(1/\epsilon ^{p-1})\). Further,
where \(C_\theta =p\max _{1\le j\le p}\Vert {{\varvec{\theta }}}_{(-j)}^*\Vert =O(p^{3/2})\) and \(E\max _{1\le j\le p}\Vert \widetilde{{\textbf {X}}}_{i(j)}\Vert <\infty\) by Assumption (A3). For a fix p, this yields that the \(\epsilon\)-bracketing number of \(\mathcal {F}\) with respect to the \(L_1\)-norm is \(\mathcal {N}_{[]}(\epsilon ,\mathcal {F},L_1(P))\le C/\epsilon ^{p-1}\) for some constant C. By Theorem 2.4.1 of Van der Vaart and Wellner (1996), we conclude that \(\mathcal {F}\) is Glivenko–Cantelli. And it follows from Glivenko–Cantelli theorem that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{11}({\textbf { w}})|=o_p(1)\). By the Cauchy–Schwarz inequality,
where \(\widehat{{\varvec{\theta }}}_{(-j),n_0}=(\widehat{a}_{j,n_0},\widehat{{\varvec{\beta }}}_{(j),n_0}^{\top })^{\top }\), then by Theorem 1 and Assumption (A3), we get \(\mathop {\sup }_{\textbf{w}\in \mathcal {H}}|CV_{12}(\textbf{w})|=o_p(1).\)
(iii) To prove (iii), we rewrite \(CV_2({\textbf { w}})=CV_{21}({\textbf { w}})+CV_{22}({\textbf { w}})\), where
Noting that \(E[CV_{21}({\textbf { w}})]=0\), \(\textrm{Var}(CV_{21}({\textbf { w}}))=O(1/n-n_0)\). Analogous to the proof of \(CV_{11}({\textbf { w}})\), we can show that \(\sup _{w\in \mathcal {H}}|CV_{21}({\textbf { w}})|=o_p(1)\). On the other hand,
similar to \(CV_{12}({\textbf { w}})\), we have \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{12}({\textbf { w}})|=o_p(1).\)
(iv) We also decompose \(CV_3({\textbf { w}})=CV_{31}({\textbf { w}})+CV_{32}({\textbf { w}})\) with
Similar to the proof of \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{11}({\textbf { w}})|=o_p(1),\) we can show that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{31}({\textbf { w}})|=o_p(1)\), the details are omitted here.
Noting that
We can prove that \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{321}({\textbf { w}})|=o_p(1)\) as shown in \(CV_{22}({\textbf { w}})\). Furthermore, by Cauchy–Schwarz inequality, we have
By Assumption (A8) and Theorem 1, we have \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{322}({\textbf { w}})|=o_p(1).\)
(v) To prove (v), we note that
following the proof of \(\sup _{{\textbf { w}}\in \mathcal {H}}|CV_{322}({\textbf { w}})|=o_p(1)\), we obtain (v).
(vi) \(CV_5=o_p(1)\) follows from the weak law of large numbers.
Finally, we complete the proof of Theorem 3. \(\square\)
1.4 A.4. Proof of Theorem 4
According to the proof of Theorem 1, we can further obtain that \(\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) uniformly for all \(\tau \in \mathcal {T}\). That is to say, we have \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\).
In the following, we prove that \(\widehat{\textbf{w}}\) is asymptotically optimal uniformly for \(\tau \in \mathcal {T}\). The proof is analogous to the proof of Theorem 3, but is more challenge due to the requirement of the asymptotic optimality of \(\widehat{\textbf{w}}\) to hold uniformly in the set of quantile indices. Specifically, we need to prove (i)-(vi) hold but with \(\textbf{w}\in \mathcal {H}\) replaced by \((\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}\) in Theorem 3.
(a) According to the proof of (i) in Theorem 3, it is easy to obtain that
which is to say that \(\mathop {\inf }\limits _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}QPE_{\tau ,n}(\textbf{w})\ge E[\rho _{\tau }(\varepsilon )]-o_{p}(1)\).
(b) We have \(CV_1(\textbf{w})=CV_{11}(\textbf{w})-CV_{12}(\textbf{w})\). To prove (b), it suffices to show that \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{11}(\textbf{w})|=o_p(1)\) and \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{12}(\textbf{w})|=o_p(1)\). Similar to the proof of (ii) in Theorem 2, to show the uniform convergence, we consider the class of functions \(\mathcal {G}=\{g(\varepsilon _i,\textbf{X}_i,U_i;\textbf{w},\tau ):(\textbf{w},\tau )\in \mathcal {H}\times \mathcal {T}\},\) where \(g(\varepsilon _i,{\textbf {X}}_i,U_i;{\textbf { w}},\tau )=\left[ Q_{\tau }({\textbf {X}}_i,U_i) -\sum _{j=1}^{p}w_{j}Q_j^*({\textbf {X}}_i,U_i)\right] \psi _{\tau }(\varepsilon _i)\). On \(\mathcal {H}\times \mathcal {T}\), we define the metric \(|\cdot |_1^t\) as \(|({\textbf { w}},t)-(\tilde{{\textbf { w}}},1-t)|_1^{t}=\sum _{j=1}^p|w_j-\tilde{w}_j|+1-2t\). Then, the \(\epsilon\)-covering number \(\mathcal {N}(\epsilon ,\mathcal {H}\times \mathcal {T},|\cdot |_1^{t})=O(1/\epsilon ^{p-1})\). Further, the \(\epsilon\)-bracketing number \(\mathcal {N}_{[]}(\epsilon ,\mathcal {G},L_1(P))\le C/\epsilon ^{p-1}\), and it follows from Glivenko–Cantelli theorem that \(\sup _{(\textbf{w},\tau )\in \mathcal {H}\times {\mathcal {T}}}|CV_{11}(\textbf{w})|=o_p(1)\).
We also have
Hence
Similarly, equations (iii), (iv), (v) and (vi) follow from the corresponding proof in Theorem 3 as well as the fact \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j)}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\) and \(\sup _{\tau \in \mathcal {T}}\Vert \widehat{{\varvec{\theta }}}_{(j),n_0}-{{\varvec{\theta }}}^{*}_{(j)}\Vert =o_p(1)\). Therefore, we complete the proof of Theorem 4. \(\square\)
About this article
Cite this article
Zhan, Z., Li, Y., Yang, Y. et al. Model averaging for semiparametric varying coefficient quantile regression models. Ann Inst Stat Math 75, 649–681 (2023). https://doi.org/10.1007/s10463-022-00857-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00857-z