Abstract
In this paper, a new variable selection procedure based on weighted composite quantile regression is proposed for varying coefficient models with a diverging number of parameters. The proposed method is based on basis function approximation and the group SCAD penalty. The new estimation method can achieve both robustness and efficiency. Furthermore, the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation are established under some suitable assumptions. Finally, the finite sample behavior of the estimator is evaluated by simulation studies. In addition, some interesting extensions are made to separate constant coefficients from varying coefficients.
Similar content being viewed by others
References
Ahmad I, Leelahanon S, Li Q (2005) Efficient estimation of a semi-parametric partially linear varying coefficient model. Ann Stat 33:258–283
Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750
Bradic J, Fan J, Wang W (2011) Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J R Stat Soc Ser B 73(3):325–349
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model space. Biometrika 95:759–771
de Boor C (2001) A practical guide to splines. Springer, New York
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery, vol III. In: Proceedings of the Madrid international congress of mathematicians, pp 595–622
Fan J, Zhang W (1999) Statistical estimation in varying coefficient models. Ann Stat 27:1491–1518
Fan J, Zhang W (2000) Simultaneous confidence bands and hypotheses testing in varying-coefficient models. Scand J Stat 27:715–731
Fan J, Zhang W (2008) Statistical methods with varying coefficient models. Stat Interface 1:179–195
Guo J, Tang M, Tian M, Zhu K (2013) Variable selection in high-dimensional partially linear additive models for composite quantile regression. Comput Stat Data Anal 65:56–67
Guo J, Tian M, Zhu K (2012) New efficient and robust estimation varying-coefficient models with heteroscedasticity. Stat Sin 22:1075–1101
Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796
Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599
Hunter D, Lange K (2000) Quantile regression via an MM algorithm. J Comput Gr Stat 9:60–77
Hunter D, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642
Jiang J, Zhao Q, Hui YV (2001) Robust modlling of ARCH models. J Forecast 20:111–133
Jiang X, Jiang J, Song X (2012) Oracle model selection for nonlinear models based on weighted composite quantile regression. Stat Sin 22:1479–1506
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and an safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332
Kim M (2007) Quantile regression with varying coefficients. Ann Stat 35:92–108
Knight K (1998) Limiting distributions for L1 regression estimators under general conditions. Ann Stat 26:755–770
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Li G, Xue L, Lian H (2011) Semi-varying coefficient models with a diverging number of components. J Multivar Anal 102:1166–1174
Noh H, Park B (2010) Sparse varying coefficient models for longitudinal data. Stat Sin 20:1183–1202
Noh H, Chung K, Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238
Silverman BW (1986) Density estimation. Chapman and Hall, London
Tang Q, Cheng L (2012) Componentwise B-spline estimation for varying coefficient models with longitudinal data. Stat Pap 53:629–652
Tang Y, Wang HJ, Zhu Z (2013) Variable selection in quantile varying coefficient models with longitudinal data. Comput Stat Data Anal 57:435–449
Tang Y, Wang HJ, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628
Wang L, Li H, Huang J (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Wei F, Huang J, Li H (2011) Variable selection and estimation in high-dimensional varying coefficient models. Stat Sin 21:1515–1540
Xue L, Qu A (2012) Variable selection in high-dimensional varying-coefficient models with global optimality. J Mach Learn Res 13:1973–1998
Yang H, Guo C, Lv J (2014) Variable selection for generalized varying coefficient models with longitudinal data. Stat Pap (accepted). doi:10.1007/s00362-014-0647-x
Zhao P, Xue L (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101:1872–1883
Zhao W, Zhang R, Lv Y, Zhao J (2013) Variable selection of the quantile varying coefficient regression models. J Korean Stat Soc 42:343–358
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Acknowledgments
The authors are very grateful to the editor, associate editor, and two anonymous referees for their detailed comments on the earlier version of the manuscript, which led to a much improved paper. This work is supported by the National Natural Science Foundation of China (Grant No. 11171361) and the Chongqing University Postgraduates’ Innovation Project.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Let C denote a generic constant that might assume different values at different places.
Lemma 1
Suppose \({\varvec{\pi }^{(j)}}{(u)^T}\varvec{\gamma }_j^0\) is the best approximating spline function for \({\beta _j}(u)\) and \({\varvec{\gamma } ^0} = {(\varvec{\gamma } _1^{0T},\ldots ,\varvec{\gamma } _{{p_n}}^{0T})^T}\). Under the conditions (C1)–(C5) together with two constants \(C_1\) and \(C_2\), we have
-
(a)
\(\mathop {\sup }\limits _{u \in [0,1]} \left| {{\beta _j}(u) - {\varvec{\pi }^{(j)}} {{(u)}^T}\varvec{\gamma } _j^0} \right| \le C_1 K_n^{ - r}\),
-
(b)
\(\mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}} \left| {{{\varvec{X}}^T}\varvec{\beta } (u) - \varvec{\Pi }^T{\varvec{\gamma } ^0}} \right| \le C_2 K_n^{ - r}\sqrt{{p_n}}\).
Let \({R_{ni}} = \varvec{\Pi } _{i}^T{\varvec{\gamma } ^0} - \mathbf{X }_i^T\varvec{\beta } ({U_i}).\) By (b) of Lemma 1, it is easy to see that \(\mathop {\max }\limits _{i} \left| {{R_{ni}}} \right| \le C_2 K_n^{ - r}\sqrt{{p_n}}\).
Proof of Lemma 1
Since \({\varvec{\pi }^{(j)}}{(u)^T}\varvec{\gamma }_j^0\) is the best approximating spline function for \({\beta _j}(u)\). According to the result on page 149 of de Boor (2001), for \({\beta _j}(u)\) satisfying condition (C1), we have \(\mathop {\sup }\limits _{u \in [0,1]} \left| {{\beta _j}(u) - {\varvec{\pi }^{(j)}} {{(u)}^T}\varvec{\gamma } _j^0} \right| \le C_1 K_n^{ - r}\). We complete the proof of Lemma 1 (a). Now, we show Lemma 1 (b). Let
So \(\varvec{\Pi }= {{\varvec{B}(u)}^T}\varvec{X}\). Using the condition (C3), we have
where \({\lambda ^{\max }} = \mathop {\sup }\limits _{u \in [0,1]} {\lambda _{\max }}\left\{ {E(\varvec{X}{\varvec{X}^T}\left| {U = u} \right. )} \right\} \) and \({\lambda _{\max }}(\varvec{A})\) denotes the maximum eigenvalues of a positive definite matrix \(\varvec{A}\).
Thus, \(\mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}} \left| {{{\varvec{X}}^T}\varvec{\beta } (u) - \varvec{\Pi }^T{\varvec{\gamma } ^0}} \right| \le {C_2}\sqrt{{p_n}} K_n^{ - r}\), where \({C_2} = \sqrt{{\lambda ^{\max }}C_1^2} \). This complete the proof. \(\square \)
Proof of Theorem 1
Let \({\alpha _n} = \sqrt{{p_n}} \left( {{n^{ - r/(2r+1)}} + {a_n}} \right) ,{\varvec{u}_n} = \alpha _n^{ - 1}\left( {\varvec{\hat{\gamma }} - {\varvec{\gamma }^0}} \right) \) with \(\varvec{u}_{nj}=\alpha _n^{ - 1}\left( {\varvec{\hat{\gamma }}_j - {\varvec{\gamma }_j^0}} \right) \), \({v_k} {=} \alpha _n^{ - 1}\left( {{\hat{c}_{\tau _k}} - c_{\tau _k}} \right) \) and \(\mathscr {F}_n{=}\left\{ {\left( {{\varvec{u}_n},\varvec{v}} \right) : {{\left\| {{{\left( {\varvec{u}_n^T,\mathbf{v ^T}} \right) }^T}} \right\| }_2} = C} \right\} \), where C is a large enough constant and \(\varvec{c} = {({c_{{\tau _1}}},\ldots ,{c_{{\tau _q}}})^T}, \varvec{v} = {({v_1},\ldots ,{v_q})^T}\). Our aim is to show that for any given \(\eta >0\), there is a large constant C such that, for large n we have
This implies that, with probability tending to one, there is local minimum \(\varvec{\hat{\gamma }} \) in the ball \(\left\{ {\left( {{\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}}} \right) {{: }}{{\left\| {{{\left( {\varvec{u}_n^T,{\varvec{v}^T}} \right) }^T}} \right\| }_2} \le C} \right\} \) such that \({\left\| {\varvec{\hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| _2} = {O_p}({\alpha _n})\). Let \({D_n}({\varvec{u}_n},\varvec{v}) = {{PL_n}({\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}},\varvec{\omega }) - {PL_n}({\varvec{\gamma } ^0},\varvec{c},\varvec{\omega })}\) and \({S_n}({\varvec{u}_n},\varvec{v})= {{L_n}({\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}},\varvec{\omega }) - {L_n}({\varvec{\gamma }^0},\varvec{c},\varvec{\omega })}\). Then
where \({P_{{\lambda _n}}}({\varvec{u}_n})=n\sum \limits _{j = 1}^{{p_n}} {\left[ {{p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0 + {\alpha _n}{\varvec{u}_{nj}}} \right\| }_2}\right) - {p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) } \right] } \).
By the identity (Knight 1998),
we have
Then we can rewrite \({S_n}({\varvec{u}_n},\varvec{v})\) as
where
Note that, \(\Vert {\varvec{\Gamma }_n}\Vert =O(1)\) and \(\varepsilon _i\) is independent of \(\varvec{X}_i\) and \(U_i\), it follow that \(E(\varvec{Z}_n^T{\varvec{u}_n}) =0\) and \(E\{ {(\varvec{Z}_n^T{\varvec{u}_n})^2}\} = \varvec{u}_n^TE({\varvec{Z}_n}\varvec{Z}_n^T){\varvec{u}_n} =O(\left\| {{\varvec{u}_n}} \right\| _2^2)\). Hence, \(\varvec{Z}_n^T{\varvec{u}_n} = {O}(\left\| {{\varvec{u}_n}} \right\| _2)\). This combined with (18) leads to
Applying the Markov inequality and condition (C3), for constant M, we have
So \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n}} \right| } \right) = o_p\left( 1 \right) \).
Thus, it is easy to show that \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k}} \right| } \right) = o_p\left( 1 \right) \). By condition (C4) and the Lebesgue’s dominated convergence theorem, we have
Here we use the fact that \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k}} \right| } \right) = o_p\left( 1 \right) \) in the third step.
Moreover,
Hence
This combined with (19), yields that
By condition (C6), we have
It follows from ((19)–(23)) that \({D_n}({\varvec{u}_n},\varvec{v})\) in (17) is dominated by the positive quadratic term \( \frac{1}{2}n\alpha _n^2\sum \limits _{k = 1}^q {{\omega _k} f({c_{{\tau _k}}}) {(v_k^2 + \varvec{u}_n^T {\varvec{\Gamma }_n}{\varvec{u}_n} + 2{v_k} {\varvec{\mu }_n^T}{\varvec{u}_n})}} \) as long as \({\left\| {{\varvec{u}_n}} \right\| _2}\) and \({\left\| {{\varvec{v}}} \right\| _2}\) are large enough. This proves (16). By Lemma 1, we have
This complete the proof. \(\square \)
Proof of Theorem 2 (a)
We use proof by contradiction. Suppose that there exists a \({s} + 1 \le {j_0} \le {p_n}\) such that the probability of \({\hat{\beta }_{{j_0}}}(u)\) being a zero function does not converge to one. Then, there exists \(\eta > 0\) such that, for infinitely many n, \(P({\varvec{\hat{\gamma }}_{{j_0}}} \ne 0) = P({\hat{\beta }_{{j_0}}} (u)\ne 0) \ge \eta .\) Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }} _{{j_0}}}\) being replaced by \(\varvec{0}\). It will be shown that there exists a \(\delta > 0\) such that \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) > 0\) with probability at least \(\delta \) for infinitely many n, which contradicts with the fact that \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) \le 0 \).
By Theorem 1, we have \( {\left\| {{{\varvec{\hat{\gamma }} }_j} -\varvec{ \gamma } _j^0} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\). Since \(\varvec{\gamma } _j^0 = \varvec{0}\) for \(j=s+1,\ldots ,p_n+1\), we have \({\left\| {{{\varvec{\hat{\gamma }} }_j}} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\) for \(j=s+1,\ldots ,p_n+1\). So \({\left\| {{{\varvec{\hat{\gamma }}}_{j_0}}} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\). With probability tending to one, \({{{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| }_2} \le {\lambda _n}}\), since \({n^{r / {(2r + 1)}}} {\lambda _n}/ {\sqrt{{p_n}}} \rightarrow \infty \). By the definition of \({p_{\lambda _n} }(.)\), we have \(P\left\{ {{p_{{\lambda _n}}}({{\left\| {{{\varvec{\hat{\gamma }} }_{{j_0}}}} \right\| }_2}) = {\lambda _n}{{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| }_2}} \right\} \rightarrow 1.\)
Since \(\left( {{\rho _\tau }(u) - {\rho _\tau }(v)} \right) \ge (\tau - I(v < 0))(u - v)\) for any \(u,v \in R\), we have
where \({r_{ni}} = {R_{ni}} + \varvec{\Pi } _{i}^T({\varvec{\hat{\gamma }} ^*} - {\varvec{\gamma } ^0}).\)
Let \(\mathbf{T _n} = {\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < {r_{ni}} + \hat{c}_{{\tau _k}}))\varvec{\Pi } _i^{({j_0})}} } \). From conditions (C3) and (C4), we obtain that for any \(L > 0\) and \(\Delta ={n^{{{ - r} / {(2r + 1)}}}}\sqrt{{p_n}} \),
Thus
By simple calculation , we obtain
By the fact that \({{{n^{{r / {(2r + 1)}}}}{\lambda _n}} / {\sqrt{{p_n}} }} \rightarrow \infty \), so \(n{\lambda _n}\) is of higher order than \(O(n\sqrt{{p_n}} {n^{{{ - r} / {(2r + 1)}}}})\). This combined with (25) and (26), we can conclude that (24) is dominated by \(n{\lambda _n}{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| _2}\), which contradicts to \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) \le 0 \). \(\square \)
Proof of Theorem 2 (b)
Let \({\varvec{u}_n} = \alpha _n^{ - 1}(\varvec{\gamma }- {\varvec{\gamma }^0})\). Partition the vectors \({\varvec{u}_n} = {(\varvec{u}_{na}^T,\varvec{u}_{nb}^T)^T}\) and \({\varvec{\Pi }_i} = {(\varvec{\Pi }_{ia}^T,\varvec{\Pi }_{ib}^T)^T}\) in the same way as \(\varvec{\gamma }= {(\varvec{\gamma }_a^T,\varvec{\gamma }_b^T)^T}\). By (17) and \(P_{\lambda _n}(0)=0\), we can write
where \({P_{{\lambda _n}}}\left( {\varvec{u}_{na}}\right) =n\sum \limits _{j = 1}^{{s}} {\left[ {{p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0 + {\alpha _n}{\varvec{u}_{nj}}} \right\| }_2}\right) - {p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) } \right] } \). By taking Taylor’s expansion for \({P_{{\lambda _n}}}\left( {\varvec{u}_{na}}\right) \) at \(\varvec{u}_{na}=0\), we obtain that
Then the minimizer \((\varvec{\hat{u}}_{na}^T,\varvec{\hat{v}}^T)^T\) of \({D_n}(({\varvec{u}_{na}^T, \varvec{0}^T})^T,\varvec{v})\) satisfies the score equations
where \({\psi _\tau }(u) ={{\rho '}_{{\tau }}}(u)= \tau - I(u < 0),\) we can write
where \({\varvec{H}_n} = {n^{ - {1 / 2}}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}\sum \limits _{k = 1}^q {{\omega _k}} [I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}}) - {\tau _k}],} \)
Taking Taylor’s explanation for \({F({R_{ni}} + {c_{{\tau _k}}} + {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))}\) at \({{R_{ni}} + {c_{{\tau _k}}}}\) gives
By direct calculation of the mean and variance, we can show, as in Jiang et al. (2001), \(B_{n22}^{(k)} = {o_p}({\alpha _n})\). This combined with (28) and (30) lead to
Similarly, (29) can be simplified as
where \( \zeta _{n,k} = {n^{{{ - 1} / 2}}}{\omega _k}\sum \limits _{i = 1}^n {[I({\varepsilon _i} < {c_{{\tau _k}}}+R_{ni}) - {\tau _k}]} \). Solving (31) and (32), we obtain that
Let \(\varvec{H}_n^* = {n^{ - {1 / 2}}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}\sum \limits _{k = 1}^q {{\omega _k}} [I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k}]}\), \( \zeta _{n,k}^* = {n^{{{ - 1} / 2}}}{\omega _k}\sum \limits _{i = 1}^n [I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k}] \). Following Jiang et al. (2012), we have
Put \({\eta _{i,k}} = I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}}) - {\tau _k},\eta _{i,k}^* = I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k},\) Moreover,
Thus, we have
By Slutsky’s theorem, conditioning on \(\mathscr {H}\), we have
Note that \({\varvec{u}_{na}} = \alpha _n^{ - 1}(\varvec{\gamma }_a - {\varvec{\gamma }_ a^0})\). It follows that
\(\square \)
Proof of Theorem 2 (c)
By the proof Theorem 2 (a), we know immediately that \({\varvec{\hat{\gamma }} _{b,{\lambda _n}}} = 0\) with probability tending to one. Consequently, we know that \({\varvec{\hat{\gamma }}_{a,{\lambda _n}}}\) must be the solution of the following normal equation
On the other hand, the oracle estimator must be the solution of the normal equation
So we have
Furthermore, the first term of the left hand side of (36) can be written as
For \(\varvec{G}_1\) and \(\varvec{G}_2\), after some direct calculation, we have
where \({\varvec{\hat{S}}_{na}} = {n^{ - 1}}f{{(}}{c_{{\tau _k}}}{{)}}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}\varvec{\Pi } _{ia}^T} .\) So
where \({\hat{\lambda }_{\min ,j }} = {\inf _u}{\lambda _{\min }}({\varvec{\hat{S}}_{na,j}}).\)
Thus, we have \(\mathop {\sup }\limits _{u \in [0,1]}| {{{{\hat{\beta }}}_{aj}}(u) - {{{\hat{\beta }} }_{ora,j}}(u)}| ^2 = {o_p}({n^{{{ - 2r} / {(2r + 1)}}}})\). This completes the proof. \(\square \)
Proof of Theorem 3
Suppose for some \(s_1 + 1 \le {j_0} \le s\), \({\varvec{\pi } ^{({j_0})T}}{\varvec{\hat{\gamma }} _{{j_0}}}\) does not represent a constant coefficient. Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }} _{{j_0}}}\) being replaced by its projection onto the subspace \(\left\{ {{\varvec{\gamma } _{{j_0}}}{{:}}{\varvec{\pi } ^{({j_0})T}}{\varvec{\gamma } _{{j_0}}}\mathrm{{ represents ~ a ~constant~ coefficient}}} \right\} \). For \(j=s_1+1,\ldots ,s\), \({{{\varvec{\hat{\gamma }} }_j}^T{{{\varvec{F}}}_j}{{\varvec{\hat{\gamma }} }_j} = 0}\). By definition of \({\varvec{\hat{\gamma }}}\) and \({\varvec{\hat{\gamma }}^* }\), we have
Since \({n^{r / {(2r + 1)}}} {\lambda _n} / {\sqrt{{p_n}} } \rightarrow \infty \), so \({\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } ={O_p}( {{n^{{{ - r} / {(2r + 1)}}}}}) = o({\lambda _n})\) and \(n{p_{{\lambda _n}}}\left\{ {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } \right\} =n\lambda _n{\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} }\) with probability tending to 1, by the definition of SCAD penalty function. By the proof of Theorem 2 (a), we have \({I} ={O_p}( {n\sqrt{{p_n}} {n^{{{ - r} / {(2r + 1)}}}}} )\left\| {\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*}} \right\| \). Noting that \(\left\| {\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*}} \right\| =\left\| {\varvec{\hat{\gamma }}_{j_0} - {{\varvec{\hat{\gamma }}_{j_0} }^*}} \right\| =O_p( {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } ) \). We can conclude that \(n{p_{{\lambda _n}}}\left\{ {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } \right\} \) dominates the other term in (37), which contradicts to \(PL(\varvec{\hat{\gamma }},\varvec{\hat{c}},\varvec{\omega }) - PL({\varvec{\hat{\gamma }} ^*},\varvec{\hat{c}},\varvec{\omega }) \le 0.\) \(\square \)
Proof of Theorem 4
We note that because of Theorem 1–Theorem 3, we only need to consider a correctly specified PLVCM without regularization terms. This reduces the problem to the one studied in Theorem 4.1 of Zou and Yuan (2008) and the results there directly apply, showing the asymptotic normality of the parametric component.
\(\square \)
Rights and permissions
About this article
Cite this article
Guo, C., Yang, H. & Lv, J. Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Stat Papers 58, 1009–1033 (2017). https://doi.org/10.1007/s00362-015-0736-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0736-5