Skip to main content
Log in

Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, a new variable selection procedure based on weighted composite quantile regression is proposed for varying coefficient models with a diverging number of parameters. The proposed method is based on basis function approximation and the group SCAD penalty. The new estimation method can achieve both robustness and efficiency. Furthermore, the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation are established under some suitable assumptions. Finally, the finite sample behavior of the estimator is evaluated by simulation studies. In addition, some interesting extensions are made to separate constant coefficients from varying coefficients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmad I, Leelahanon S, Li Q (2005) Efficient estimation of a semi-parametric partially linear varying coefficient model. Ann Stat 33:258–283

    Article  MATH  Google Scholar 

  • Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750

    Article  MathSciNet  MATH  Google Scholar 

  • Bradic J, Fan J, Wang W (2011) Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J R Stat Soc Ser B 73(3):325–349

    Article  MathSciNet  Google Scholar 

  • Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model space. Biometrika 95:759–771

    Article  MathSciNet  MATH  Google Scholar 

  • de Boor C (2001) A practical guide to splines. Springer, New York

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery, vol III. In: Proceedings of the Madrid international congress of mathematicians, pp 595–622

  • Fan J, Zhang W (1999) Statistical estimation in varying coefficient models. Ann Stat 27:1491–1518

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Zhang W (2000) Simultaneous confidence bands and hypotheses testing in varying-coefficient models. Scand J Stat 27:715–731

    Article  MATH  Google Scholar 

  • Fan J, Zhang W (2008) Statistical methods with varying coefficient models. Stat Interface 1:179–195

    Article  MathSciNet  MATH  Google Scholar 

  • Guo J, Tang M, Tian M, Zhu K (2013) Variable selection in high-dimensional partially linear additive models for composite quantile regression. Comput Stat Data Anal 65:56–67

    Article  MathSciNet  Google Scholar 

  • Guo J, Tian M, Zhu K (2012) New efficient and robust estimation varying-coefficient models with heteroscedasticity. Stat Sin 22:1075–1101

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B 55:757–796

    MathSciNet  MATH  Google Scholar 

  • Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599

    MathSciNet  MATH  Google Scholar 

  • Hunter D, Lange K (2000) Quantile regression via an MM algorithm. J Comput Gr Stat 9:60–77

    MathSciNet  Google Scholar 

  • Hunter D, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang J, Zhao Q, Hui YV (2001) Robust modlling of ARCH models. J Forecast 20:111–133

    Article  Google Scholar 

  • Jiang X, Jiang J, Song X (2012) Oracle model selection for nonlinear models based on weighted composite quantile regression. Stat Sin 22:1479–1506

    MathSciNet  MATH  Google Scholar 

  • Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and an safe alternative to local polynomial regression. J R Stat Soc Ser B 72:49–69

    Article  MathSciNet  Google Scholar 

  • Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332

    Article  MathSciNet  MATH  Google Scholar 

  • Kim M (2007) Quantile regression with varying coefficients. Ann Stat 35:92–108

    Article  MathSciNet  MATH  Google Scholar 

  • Knight K (1998) Limiting distributions for L1 regression estimators under general conditions. Ann Stat 26:755–770

    Article  MATH  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Li G, Xue L, Lian H (2011) Semi-varying coefficient models with a diverging number of components. J Multivar Anal 102:1166–1174

    Article  MathSciNet  MATH  Google Scholar 

  • Noh H, Park B (2010) Sparse varying coefficient models for longitudinal data. Stat Sin 20:1183–1202

    MathSciNet  MATH  Google Scholar 

  • Noh H, Chung K, Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman BW (1986) Density estimation. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Tang Q, Cheng L (2012) Componentwise B-spline estimation for varying coefficient models with longitudinal data. Stat Pap 53:629–652

    Article  MathSciNet  MATH  Google Scholar 

  • Tang Y, Wang HJ, Zhu Z (2013) Variable selection in quantile varying coefficient models with longitudinal data. Comput Stat Data Anal 57:435–449

    Article  MathSciNet  MATH  Google Scholar 

  • Tang Y, Wang HJ, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628

    MathSciNet  MATH  Google Scholar 

  • Wang L, Li H, Huang J (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757

    Article  MathSciNet  MATH  Google Scholar 

  • Wei F, Huang J, Li H (2011) Variable selection and estimation in high-dimensional varying coefficient models. Stat Sin 21:1515–1540

    MathSciNet  MATH  Google Scholar 

  • Xue L, Qu A (2012) Variable selection in high-dimensional varying-coefficient models with global optimality. J Mach Learn Res 13:1973–1998

    MathSciNet  MATH  Google Scholar 

  • Yang H, Guo C, Lv J (2014) Variable selection for generalized varying coefficient models with longitudinal data. Stat Pap (accepted). doi:10.1007/s00362-014-0647-x

  • Zhao P, Xue L (2010) Variable selection for semiparametric varying coefficient partially linear errors-in-variables models. J Multivar Anal 101:1872–1883

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao W, Zhang R, Lv Y, Zhao J (2013) Variable selection of the quantile varying coefficient regression models. J Korean Stat Soc 42:343–358

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are very grateful to the editor, associate editor, and two anonymous referees for their detailed comments on the earlier version of the manuscript, which led to a much improved paper. This work is supported by the National Natural Science Foundation of China (Grant No. 11171361) and the Chongqing University Postgraduates’ Innovation Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Lv.

Appendix

Appendix

Let C denote a generic constant that might assume different values at different places.

Lemma 1

Suppose \({\varvec{\pi }^{(j)}}{(u)^T}\varvec{\gamma }_j^0\) is the best approximating spline function for \({\beta _j}(u)\) and \({\varvec{\gamma } ^0} = {(\varvec{\gamma } _1^{0T},\ldots ,\varvec{\gamma } _{{p_n}}^{0T})^T}\). Under the conditions (C1)–(C5) together with two constants \(C_1\) and \(C_2\), we have

  1. (a)

    \(\mathop {\sup }\limits _{u \in [0,1]} \left| {{\beta _j}(u) - {\varvec{\pi }^{(j)}} {{(u)}^T}\varvec{\gamma } _j^0} \right| \le C_1 K_n^{ - r}\),

  2. (b)

    \(\mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}} \left| {{{\varvec{X}}^T}\varvec{\beta } (u) - \varvec{\Pi }^T{\varvec{\gamma } ^0}} \right| \le C_2 K_n^{ - r}\sqrt{{p_n}}\).

    Let \({R_{ni}} = \varvec{\Pi } _{i}^T{\varvec{\gamma } ^0} - \mathbf{X }_i^T\varvec{\beta } ({U_i}).\) By (b) of Lemma 1, it is easy to see that \(\mathop {\max }\limits _{i} \left| {{R_{ni}}} \right| \le C_2 K_n^{ - r}\sqrt{{p_n}}\).

Proof of Lemma 1

Since \({\varvec{\pi }^{(j)}}{(u)^T}\varvec{\gamma }_j^0\) is the best approximating spline function for \({\beta _j}(u)\). According to the result on page 149 of de Boor (2001), for \({\beta _j}(u)\) satisfying condition (C1), we have \(\mathop {\sup }\limits _{u \in [0,1]} \left| {{\beta _j}(u) - {\varvec{\pi }^{(j)}} {{(u)}^T}\varvec{\gamma } _j^0} \right| \le C_1 K_n^{ - r}\). We complete the proof of Lemma 1 (a). Now, we show Lemma 1 (b). Let

$$\begin{aligned} \varvec{B}(u) = \left( \begin{array}{l} {B_{11}}(u)~ \cdots ~{B_{1{K_1}}}(u)~~0 ~\cdots 0~~~~0 ~~~~~~~\cdots ~~~~~~0 \\ ~~~\vdots ~~~~~~ \ddots ~~~~~\vdots ~~~~~~~~\vdots ~ \ddots ~ \vdots ~~~~\vdots ~~~~~~~~\ddots ~~~~~\vdots \\ Z~~~0 ~~~~~\cdots ~~~~~0~~~~~~~0 ~\cdots ~0~~{B_{{p_n},1}}(u) ~\cdots ~ {B_{{p_n},{K_{{p_n}}}}}(u) \\ \end{array} \right) . \end{aligned}$$

So \(\varvec{\Pi }= {{\varvec{B}(u)}^T}\varvec{X}\). Using the condition (C3), we have

$$\begin{aligned} \begin{array}{l} \mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}}{\left| {{\varvec{X}^T}\varvec{\beta }(u) - {\varvec{\Pi }^T}{\varvec{\gamma }^0}} \right| ^2}\\ = \mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}}{\left| {{\varvec{X}^T}\left( {\varvec{\beta }(u) - \varvec{B}(u){\varvec{\gamma }^0}} \right) } \right| ^2} \\ = {\sup _{(u,\varvec{X}) \in [0,1] \times {R^{{p_n}}}}}{\left( {\varvec{\beta }(u) -\varvec{B}(u){\varvec{\gamma }^0}} \right) ^T}\varvec{X}{\varvec{X}^T}\left( {\varvec{\beta }(u) - \varvec{B}(u){\varvec{\gamma }^0}} \right) \\ \le \mathop {\sup }\limits _{(u,X) \in [0,1] \times {R^{{p_n}}}} {\lambda _{\max }}(\varvec{X}{\varvec{X}^T}){\left( {\varvec{\beta }(u) - \varvec{B}\left( u \right) {\varvec{\gamma }^0}} \right) ^T}\left( {\varvec{\beta }(u) - \varvec{B}\left( u \right) {\varvec{\gamma }^0}} \right) \\ \rightarrow \mathop {\sup }\limits _{u \in [0,1]} {\lambda _{\max }}\left\{ {E(\varvec{X}{\varvec{X}^T}\left| {U = u} \right. )} \right\} \sum \nolimits _{j = 1}^{{p_n}} {{{\left( {{\beta _j}(u) - {\pi ^{(j)}}{{(u)}^T}\gamma _j^0} \right) }^2}} \\ \le \mathop {\sup }\limits _{u \in [0,1]} {\lambda _{\max }}\left\{ {E(\varvec{X}{\varvec{X}^T}\left| {U = u} \right. )} \right\} \sum \nolimits _{j = 1}^{{p_n}} {\mathop {\sup }\limits _{u \in [0,1]} {{\left( {{\beta _j}(u) - {\pi ^{(j)}}{{(u)}^T}\gamma _j^0} \right) }^2}} \\ \le {\lambda ^{\max }}C_1^2{p_n}K_n^{ - 2r} \\ \end{array} \end{aligned}$$

where \({\lambda ^{\max }} = \mathop {\sup }\limits _{u \in [0,1]} {\lambda _{\max }}\left\{ {E(\varvec{X}{\varvec{X}^T}\left| {U = u} \right. )} \right\} \) and \({\lambda _{\max }}(\varvec{A})\) denotes the maximum eigenvalues of a positive definite matrix \(\varvec{A}\).

Thus, \(\mathop {\sup }\limits _{(u,{\varvec{X}}) \in [0,1] \times {R^{{p_n}}}} \left| {{{\varvec{X}}^T}\varvec{\beta } (u) - \varvec{\Pi }^T{\varvec{\gamma } ^0}} \right| \le {C_2}\sqrt{{p_n}} K_n^{ - r}\), where \({C_2} = \sqrt{{\lambda ^{\max }}C_1^2} \). This complete the proof. \(\square \)

Proof of Theorem 1

Let \({\alpha _n} = \sqrt{{p_n}} \left( {{n^{ - r/(2r+1)}} + {a_n}} \right) ,{\varvec{u}_n} = \alpha _n^{ - 1}\left( {\varvec{\hat{\gamma }} - {\varvec{\gamma }^0}} \right) \) with \(\varvec{u}_{nj}=\alpha _n^{ - 1}\left( {\varvec{\hat{\gamma }}_j - {\varvec{\gamma }_j^0}} \right) \), \({v_k} {=} \alpha _n^{ - 1}\left( {{\hat{c}_{\tau _k}} - c_{\tau _k}} \right) \) and \(\mathscr {F}_n{=}\left\{ {\left( {{\varvec{u}_n},\varvec{v}} \right) : {{\left\| {{{\left( {\varvec{u}_n^T,\mathbf{v ^T}} \right) }^T}} \right\| }_2} = C} \right\} \), where C is a large enough constant and \(\varvec{c} = {({c_{{\tau _1}}},\ldots ,{c_{{\tau _q}}})^T}, \varvec{v} = {({v_1},\ldots ,{v_q})^T}\). Our aim is to show that for any given \(\eta >0\), there is a large constant C such that, for large n we have

$$\begin{aligned} P\left\{ {\mathop {\inf }\limits _{\left( {{\varvec{u}_n},\varvec{v}} \right) \in \mathscr {F}_n} {PL_n}({\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}},\varvec{\omega } ) > {PL_n}({\varvec{\gamma } ^0},\varvec{c},\varvec{\omega })} \right\} \ge 1 - \eta . \end{aligned}$$
(16)

This implies that, with probability tending to one, there is local minimum \(\varvec{\hat{\gamma }} \) in the ball \(\left\{ {\left( {{\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}}} \right) {{: }}{{\left\| {{{\left( {\varvec{u}_n^T,{\varvec{v}^T}} \right) }^T}} \right\| }_2} \le C} \right\} \) such that \({\left\| {\varvec{\hat{\gamma }} - {\varvec{\gamma } ^0}} \right\| _2} = {O_p}({\alpha _n})\). Let \({D_n}({\varvec{u}_n},\varvec{v}) = {{PL_n}({\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}},\varvec{\omega }) - {PL_n}({\varvec{\gamma } ^0},\varvec{c},\varvec{\omega })}\) and \({S_n}({\varvec{u}_n},\varvec{v})= {{L_n}({\varvec{\gamma } ^0} + {\alpha _n}{\varvec{u}_n},\varvec{c} + {\alpha _n}{\varvec{v}},\varvec{\omega }) - {L_n}({\varvec{\gamma }^0},\varvec{c},\varvec{\omega })}\). Then

$$\begin{aligned} {D_n}({\varvec{u}_n},\varvec{v}) = {S_n}({\varvec{u}_n},\varvec{v}) + {P_{{\lambda _n}}}({\varvec{u}_n}), \end{aligned}$$
(17)

where \({P_{{\lambda _n}}}({\varvec{u}_n})=n\sum \limits _{j = 1}^{{p_n}} {\left[ {{p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0 + {\alpha _n}{\varvec{u}_{nj}}} \right\| }_2}\right) - {p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) } \right] } \).

By the identity (Knight 1998),

$$\begin{aligned} \left| {z - y} \right| - \left| z \right| = - y{\mathrm{sgn}} (z) + 2(y - z)\left\{ {I(0 < z < y) - I(y < z < 0)} \right\} , \end{aligned}$$

we have

$$\begin{aligned} {\rho _\tau }(r - s) - {\rho _\tau }(r) = s(I(r < 0) - \tau ) + \int _0^s {[I(r \le t) - I(r \le 0)]} dt. \end{aligned}$$

Then we can rewrite \({S_n}({\varvec{u}_n},\varvec{v})\) as

$$\begin{aligned} {S_n}({\varvec{u}_n},\varvec{v})= & {} \sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})} } [I({\varepsilon _i} < {R_{ni}} + c_{\tau _k}) - {\tau _k}]\nonumber \\&+\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})}{[I({\varepsilon _i} \le x + {R_{ni}} + c_{\tau _k}) - I({\varepsilon _i} \le {R_{ni}} +c_{\tau _k})]} } } dx\nonumber \\= & {} \sqrt{n} \alpha _n \left( {\varvec{Z}_n^T{\varvec{u}_n} + \varvec{z} _n^T\varvec{v}} \right) + \sum \limits _{k = 1}^q {{\omega _k}B_n^{(k)}}, \end{aligned}$$
(18)

where

$$\begin{aligned} B_n^{(k)}= & {} \sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})} {[I({\varepsilon _i} \le x + {R_{ni}} + c_{\tau _k}) - I({\varepsilon _i} \le {R_{ni}} + c_{\tau _k})]} }dx,\\ {\varvec{Z}_n}= & {} n^{-1/2} \sum \limits _{i = 1}^n {{\varvec{\Pi } _i}\sum \limits _{k = 1}^q {{\omega _k}} [I({\varepsilon _i} < {R_{ni}} + c_{\tau _k}) - {\tau _k}],}\\ {z _{n,k}}= & {} n^{-1/2} \sum \limits _{i = 1}^n {{\omega _k}\left[ {I({\varepsilon _i} < {R_{ni}} +c_{\tau _k}) - {\tau _k}} \right] }\;\, \mathrm{and}\;\, {\varvec{z} _n} = {\left( {{z _{n,1}},\ldots ,{z _{n,q}}} \right) ^T}. \end{aligned}$$

Note that, \(\Vert {\varvec{\Gamma }_n}\Vert =O(1)\) and \(\varepsilon _i\) is independent of \(\varvec{X}_i\) and \(U_i\), it follow that \(E(\varvec{Z}_n^T{\varvec{u}_n}) =0\) and \(E\{ {(\varvec{Z}_n^T{\varvec{u}_n})^2}\} = \varvec{u}_n^TE({\varvec{Z}_n}\varvec{Z}_n^T){\varvec{u}_n} =O(\left\| {{\varvec{u}_n}} \right\| _2^2)\). Hence, \(\varvec{Z}_n^T{\varvec{u}_n} = {O}(\left\| {{\varvec{u}_n}} \right\| _2)\). This combined with (18) leads to

$$\begin{aligned} {S_n}({\varvec{u}_n},\varvec{v}) = \sum \limits _{k = 1}^q {{\omega _k}B_n^{(k)}} + {o_p}\left( n\alpha _n^2 \right) \left\| {{\varvec{u}_n}} \right\| _2, \end{aligned}$$
(19)

Applying the Markov inequality and condition (C3), for constant M, we have

$$\begin{aligned} P\left( {\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{u_n}} \right| } \right) > \eta } \right)&= P\left( {\bigcup \limits _{i = 1}^n {\left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n}} \right| > \eta } \right) } } \right) \\&\le nP\left( {{\alpha _n}\left| {\varvec{\Pi }_1^T{\varvec{u}_n}} \right| > \eta } \right) \\&\le n\frac{{\alpha _n^8}}{{{\eta ^8}}}E{\left( {\varvec{\Pi }_1^T{\varvec{u}_n}} \right) ^8} \\&=n\frac{{\alpha _n^8}}{{{\eta ^8}}}E\left\{ {E\left[ {{{\left( {\varvec{\Pi }_1^T{\varvec{u}_n}} \right) }^8}\left| U \right. } \right] } \right\} \\&\le n\frac{{\alpha _n^8}}{{{\eta ^8}}}{\left\| {{\varvec{u}_n}} \right\| ^8}E\left\{ {E\left( {{{\left\| {{\varvec{\Pi }_1}} \right\| }^8}\left| U \right. } \right) } \right\} \\&\le n\frac{{\alpha _n^8}}{{{\eta ^8}}}{\left\| {{\varvec{u}_n}} \right\| ^8}E\left\{ {E\left( {{{\left\| {{\varvec{X} _1}} \right\| }^8}\left| U \right. } \right) } \right\} \\&\le n\frac{{\alpha _n^8}}{{{\eta ^8}}}{C^8}{M^8} \\&\rightarrow 0 \\ \end{aligned}$$

So \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n}} \right| } \right) = o_p\left( 1 \right) \).

Thus, it is easy to show that \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k}} \right| } \right) = o_p\left( 1 \right) \). By condition (C4) and the Lebesgue’s dominated convergence theorem, we have

$$\begin{aligned}&E({B_n}^{(k)}\left| \mathscr {H} \right. ) \nonumber \\&\quad =E\left\{ \sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})} {[I({\varepsilon _i} \le x + {R_{ni}} + c_{\tau _k}) - I({\varepsilon _i} \le {R_{ni}} + c_{\tau _k})]} } dx \left| \mathscr {H} \right. \right\} \nonumber \\&\quad = \sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})} {[F(x + {R_{ni}} + {c_{{\tau _k}}}) - F({R_{ni}} + {c_{{\tau _k}}})]} } dx\nonumber \\&\quad = \sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} {[f({R_{ni}} + {c_{{\tau _k}}})x(1 + o_p(1))]} } dx\nonumber \\&\quad =\frac{{f{{(}}{c_{{\tau _k}}}{{)}}}}{2}\alpha _n^2\sum \limits _{i = 1}^n {(v_k^2 + \varvec{u}_n^T{\varvec{\Pi } _i}\varvec{\Pi } _i^T{\varvec{u}_n} + 2{v_k}\varvec{\Pi } _i^T{\varvec{u}_n})} {{(1 + }}{o_p}{{(1))}}\nonumber \\&\quad =\frac{{f{{(}}{c_{{\tau _k}}}{{)}}}}{2}n\alpha _n^2 {(v_k^2 + \varvec{u}_n^T {\varvec{\Gamma }_n}{\varvec{u}_n} + 2{v_k} {\varvec{\mu }_n^T}{\varvec{u}_n})} {{(1 + }}{o_p}{{(1))}}. \end{aligned}$$
(20)

Here we use the fact that \(\mathop {\max }\limits _i \left( {{\alpha _n}\left| {\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k}} \right| } \right) = o_p\left( 1 \right) \) in the third step.

Moreover,

$$\begin{aligned}&{\mathrm{Var}} ({B_n}^{(k)}\left| \mathscr {H} \right. )\nonumber \\&\quad =\mathrm{Var} \left\{ \sum \limits _{i = 1}^n {\int _0^{{\alpha _n}(\varvec{\Pi } _i^T{\varvec{u}_n} + {v_k})} {[I({\varepsilon _i} \le x + {R_{ni}} + c_{\tau _k}) - I({\varepsilon _i} \le {R_{ni}} + c_{\tau _k})]} } dx \left| \mathscr {H} \right. \right\} \nonumber \\&\quad \le \sum \limits _{i = 1}^n E \left[ {{{\left( {\int _0^{{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} {\left\{ {I({\varepsilon _i} < x + {R_{ni}} + {c_{{\tau _k}}}) - I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}})} \right\} dx} } \right) }^2}\left| \mathscr {H} \right. } \right] \nonumber \\&\quad \le \sum \limits _{i = 1}^n {\int _0^{\left| {{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} \right| } {\int _0^{\left| {{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} \right| } {\Bigg \{ {F\left( {R_{ni}} + {c_{{\tau _k}}} + \left| {{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} \right| \right) }} } } \nonumber \\&\qquad { - F({R_{ni}} + {c_{{\tau _k}}})} \Bigg \}d{x_1}d{x_2}\nonumber \\&\quad \le o\left( \sum \limits _{i = 1}^n {{{\left| {{\alpha _n}(\varvec{\Pi }_i^T{\varvec{u}_n} + {v_k})} \right| }^2}}\right) \nonumber \\&\quad = {o_p}( n\alpha _n^2 )\left\| {{\varvec{u}_n}} \right\| _2^2 . \end{aligned}$$
(21)

Hence

$$\begin{aligned} B_n^{(k)}=\frac{{f({c_{{\tau _k}}}\mathrm{{)}}}}{2}n\alpha _n^2 {(v_k^2 + \varvec{u}_n^T {\varvec{\Gamma }_n}{\varvec{u}_n} + 2{v_k} {\varvec{\mu }_n^T}{\varvec{u}_n})} \mathrm{{(1 + }}{o_p}\mathrm{{(1))}}. \end{aligned}$$

This combined with (19), yields that

$$\begin{aligned} {S_n}({\varvec{u}_n},\varvec{v})= & {} \frac{1}{2}n\alpha _n^2\sum \limits _{k = 1}^q {{\omega _k} f({c_{{\tau _k}}}) {(v_k^2 + \varvec{u}_n^T {\varvec{\Gamma }_n}{\varvec{u}_n} + 2{v_k} {\varvec{\mu }_n^T}{\varvec{u}_n})} \mathrm{{(1 + }}{o_p}\mathrm{{(1))}} } \nonumber \\&+ \,{o_p}\left( n\alpha _n^2 \right) \left\| {{\varvec{u}_n}} \right\| _2. \end{aligned}$$
(22)

By condition (C6), we have

$$\begin{aligned} {P_{{\lambda _n}}}({\varvec{u}_n})&\ge \sum \limits _{j = 1}^{{s}} {\left[ {n{\alpha _n}{{p'}_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) \frac{{\varvec{\gamma }_j^{0T}{\varvec{u}_{nj}}}}{{{{\left\| {\varvec{\gamma }_j^0} \right\| }_2}}} + \frac{1}{2}n\alpha _n^2{{p''}_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) \varvec{u}_{nj}^T{\varvec{u}_{nj}}(1 + o(1))} \right] } \nonumber \\&\ge - (n\alpha _n^2{\left\| {{\varvec{u}_n}} \right\| _2} + {o}(n\alpha _n^2)\left\| {{\varvec{u}_n}} \right\| _2^2). \end{aligned}$$
(23)

It follows from ((19)–(23)) that \({D_n}({\varvec{u}_n},\varvec{v})\) in (17) is dominated by the positive quadratic term \( \frac{1}{2}n\alpha _n^2\sum \limits _{k = 1}^q {{\omega _k} f({c_{{\tau _k}}}) {(v_k^2 + \varvec{u}_n^T {\varvec{\Gamma }_n}{\varvec{u}_n} + 2{v_k} {\varvec{\mu }_n^T}{\varvec{u}_n})}} \) as long as \({\left\| {{\varvec{u}_n}} \right\| _2}\) and \({\left\| {{\varvec{v}}} \right\| _2}\) are large enough. This proves (16). By Lemma 1, we have

$$\begin{aligned} {n^{ - 1}}\sum \limits _{i = 1}^n {{{{({{\hat{\beta }}_j}({U_i}) - {\beta _j}({U_i}))}^2}} }\le & {} \frac{2}{n}\sum \limits _{i = 1}^n { {{{(\varvec{\pi } _i^{(j)T}({{\varvec{\hat{\gamma }}}_j} - \varvec{\gamma } _j^0))}^2}} } + \frac{2}{n}\sum \limits _{i = 1}^n { {{{(\varvec{\pi } _i^{(j)T}{\varvec{\gamma }_j}^0 - {\beta _j}({U_i}))}^2}} } \\\le & {} \frac{2}{n} {{{({{\varvec{\hat{\gamma }}}_j} - \varvec{\gamma } _j^0)}^T}\sum \limits _{i = 1}^n {\varvec{\pi } _i^{(j)}\varvec{\pi } _i^{(j)T}} ({{\varvec{\hat{\gamma }} }_j} - \varvec{\gamma } _j^0) + } {C^2K _n^{-2r}}\\= & {} {O_p}({n^{{{ - 2r} / {(2r + 1)}}}} ). \end{aligned}$$

This complete the proof. \(\square \)

Proof of Theorem 2 (a)

We use proof by contradiction. Suppose that there exists a \({s} + 1 \le {j_0} \le {p_n}\) such that the probability of \({\hat{\beta }_{{j_0}}}(u)\) being a zero function does not converge to one. Then, there exists \(\eta > 0\) such that, for infinitely many n, \(P({\varvec{\hat{\gamma }}_{{j_0}}} \ne 0) = P({\hat{\beta }_{{j_0}}} (u)\ne 0) \ge \eta .\) Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }} _{{j_0}}}\) being replaced by \(\varvec{0}\). It will be shown that there exists a \(\delta > 0\) such that \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) > 0\) with probability at least \(\delta \) for infinitely many n, which contradicts with the fact that \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) \le 0 \).

By Theorem 1, we have \( {\left\| {{{\varvec{\hat{\gamma }} }_j} -\varvec{ \gamma } _j^0} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\). Since \(\varvec{\gamma } _j^0 = \varvec{0}\) for \(j=s+1,\ldots ,p_n+1\), we have \({\left\| {{{\varvec{\hat{\gamma }} }_j}} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\) for \(j=s+1,\ldots ,p_n+1\). So \({\left\| {{{\varvec{\hat{\gamma }}}_{j_0}}} \right\| _2} = {O_p}({n^{{{ - r} / {(2r + 1)}}}})\). With probability tending to one, \({{{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| }_2} \le {\lambda _n}}\), since \({n^{r / {(2r + 1)}}} {\lambda _n}/ {\sqrt{{p_n}}} \rightarrow \infty \). By the definition of \({p_{\lambda _n} }(.)\), we have \(P\left\{ {{p_{{\lambda _n}}}({{\left\| {{{\varvec{\hat{\gamma }} }_{{j_0}}}} \right\| }_2}) = {\lambda _n}{{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| }_2}} \right\} \rightarrow 1.\)

Since \(\left( {{\rho _\tau }(u) - {\rho _\tau }(v)} \right) \ge (\tau - I(v < 0))(u - v)\) for any \(u,v \in R\), we have

$$\begin{aligned}&{PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) \nonumber \\&\quad = {L_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{L_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) + n\sum \limits _{j = 1}^{{p_n}} {\left( {{p_{{\lambda _n}}}({{\left\| {{{\varvec{\hat{\gamma }}}_j}} \right\| }_2}) - {p_{{\lambda _n}}}({{\left\| {\varvec{\hat{\gamma }} _j^*} \right\| }_2})} \right) } \nonumber \\&\quad {{= }}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {\left\{ {{\rho _{{\tau _k}}}\left( {{Y_i} - \hat{c}_{{\tau _k}} -\varvec{\Pi }_i^T\varvec{\hat{\gamma }} } \right) - {\rho _{{\tau _k}}}\left( {{Y_i} - \hat{c}_{{\tau _k}} - \varvec{\Pi }_i^T{{\varvec{\hat{\gamma }}}^*}} \right) } \right\} } } + n{\lambda _n}{\left\| {{{\varvec{\hat{\gamma }} }_{{j_0}}}} \right\| _2} \nonumber \\&\quad \ge - \sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {({\tau _k} - I({\varepsilon _i} < 0))} } \varvec{\Pi } _i^T(\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*})\nonumber \\&\qquad - \sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < {r_{ni}}+\hat{c}_{{\tau _k}}))} } \varvec{\Pi } _i^T(\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*}) + n{\lambda _n}{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| _2} \nonumber \\&\quad \ge \left\{ { - {{\left\| {\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {({\tau _k} - I({\varepsilon _i} < 0))\varvec{\Pi } _i^{({j_0})}} } } \right\| }_2} } \right. \nonumber \\&\quad \left. {~~~~~ -{{\left\| {\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < {r_{ni}}+\hat{c}_{{\tau _k}}))\varvec{\Pi } _i^{({j_0})}} } } \right\| }_2}+ n{\lambda _n}} \right\} {\left\| {{{\varvec{\hat{\gamma }} }_{{j_0}}}} \right\| _2},\nonumber \\ \end{aligned}$$
(24)

where \({r_{ni}} = {R_{ni}} + \varvec{\Pi } _{i}^T({\varvec{\hat{\gamma }} ^*} - {\varvec{\gamma } ^0}).\)

Let \(\mathbf{T _n} = {\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < {r_{ni}} + \hat{c}_{{\tau _k}}))\varvec{\Pi } _i^{({j_0})}} } \). From conditions (C3) and (C4), we obtain that for any \(L > 0\) and \(\Delta ={n^{{{ - r} / {(2r + 1)}}}}\sqrt{{p_n}} \),

$$\begin{aligned}&E(\varvec{T}_n^T{\varvec{T}_n}) \\&\quad = E\left\{ {\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < L\Delta ))\varvec{\Pi } _i^{({j_0})T}\sum \limits _{k = 1}^n {{{(I({\varepsilon _k} < 0) - I({\varepsilon _k} < L\Delta ))}}\varvec{\Pi } _k^{({j_0})}} } } \right\} \\&\quad \le nE{\left\{ {(I(\varepsilon < 0) - I(\varepsilon < L\Delta ))\left| {{\varvec{\Pi }^{({j_0})}}} \right| } \right\} ^2} \\&\qquad + n(n - 1){\left[ {E\left\{ {(I(\varepsilon < 0) - I(\varepsilon < L\Delta ))\left| {{\varvec{\Pi }^{({j_0})}}} \right| } \right\} } \right] ^2} \\&\quad \le {M}^2\left\{ {\left( {L\Delta n} \right) + {{\left( {L\Delta n} \right) }^2}} \right\} \\&\quad \le C^2 {n^2}{p_n}{n^{{{ -2 r} / {(2r + 1)}}}}.\\ \end{aligned}$$

Thus

$$\begin{aligned} {{{\left\| {\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {(I({\varepsilon _i} < 0) - I({\varepsilon _i} < {r_{ni}}))\varvec{\Pi } _i^{({j_0})}} } } \right\| }_2}}={O_p}(n\sqrt{{p_n}} {n^{{{ - r} / {(2r + 1)}}}}), \end{aligned}$$
(25)

By simple calculation , we obtain

$$\begin{aligned} {{{\left\| {\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {({\tau _k} - I({\varepsilon _i} < 0))\varvec{\Pi } _i^{({j_0})}} } } \right\| }_2}}={O_p}(\sqrt{n{p_n}}). \end{aligned}$$
(26)

By the fact that \({{{n^{{r / {(2r + 1)}}}}{\lambda _n}} / {\sqrt{{p_n}} }} \rightarrow \infty \), so \(n{\lambda _n}\) is of higher order than \(O(n\sqrt{{p_n}} {n^{{{ - r} / {(2r + 1)}}}})\). This combined with (25) and (26), we can conclude that (24) is dominated by \(n{\lambda _n}{\left\| {{{\varvec{\hat{\gamma }}}_{{j_0}}}} \right\| _2}\), which contradicts to \({PL_n}\left( {\varvec{\hat{\gamma }} ,\varvec{\hat{c}}; \varvec{\omega } } \right) -{PL_n}\left( {\varvec{\hat{\gamma }}^* ,\varvec{\hat{c}}; \varvec{\omega } } \right) \le 0 \).   \(\square \)

Proof of Theorem 2 (b)

Let \({\varvec{u}_n} = \alpha _n^{ - 1}(\varvec{\gamma }- {\varvec{\gamma }^0})\). Partition the vectors \({\varvec{u}_n} = {(\varvec{u}_{na}^T,\varvec{u}_{nb}^T)^T}\) and \({\varvec{\Pi }_i} = {(\varvec{\Pi }_{ia}^T,\varvec{\Pi }_{ib}^T)^T}\) in the same way as \(\varvec{\gamma }= {(\varvec{\gamma }_a^T,\varvec{\gamma }_b^T)^T}\). By (17) and \(P_{\lambda _n}(0)=0\), we can write

$$\begin{aligned} {D_n}(({\varvec{u}_{na}^T, \varvec{0}^T})^T,\varvec{v}) = {S_n}(({\varvec{u}_{na}^T, \varvec{0}^T})^T,\varvec{v}) + {P_{{\lambda _n}}}({\varvec{u}_{na}}), \end{aligned}$$
(27)

where \({P_{{\lambda _n}}}\left( {\varvec{u}_{na}}\right) =n\sum \limits _{j = 1}^{{s}} {\left[ {{p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0 + {\alpha _n}{\varvec{u}_{nj}}} \right\| }_2}\right) - {p_{{\lambda _n}}}\left( {{\left\| {\varvec{\gamma }_j^0} \right\| }_2}\right) } \right] } \). By taking Taylor’s expansion for \({P_{{\lambda _n}}}\left( {\varvec{u}_{na}}\right) \) at \(\varvec{u}_{na}=0\), we obtain that

$$\begin{aligned} {P_{{\lambda _n}}}({\varvec{u}_{na}})=n{\alpha _n}\varvec{c}_n^T{\varvec{u}_{na}} + \frac{1}{2}n\alpha _n^2\varvec{u}_{na}^T{\varvec{\Sigma }_{{\lambda _n}}}{\varvec{u}_{na}}(1 + o(1)). \end{aligned}$$

Then the minimizer \((\varvec{\hat{u}}_{na}^T,\varvec{\hat{v}}^T)^T\) of \({D_n}(({\varvec{u}_{na}^T, \varvec{0}^T})^T,\varvec{v})\) satisfies the score equations

$$\begin{aligned} \begin{array}{l} {n^{ - 1}}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}{\psi _{{\tau _k}}}({\varepsilon _i} - {c_{{\tau _k}}} -R_{ni}- {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))} } (1 + o_p(1)) \\ \quad = {\varvec{c}_n} + {\alpha _n}{\varvec{\Sigma }_{{\lambda _n}}}{\varvec{\hat{u}}_{na}}(1 + {o_p}(1)), \\ \end{array} \end{aligned}$$
(28)
$$\begin{aligned} {\omega _k}\sum \limits _{i = 1}^n {{\psi _{{\tau _k}}}({\varepsilon _i} - {c_{{\tau _k}}} - R_{ni}-{\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k})) = 0}, \end{aligned}$$
(29)

where \({\psi _\tau }(u) ={{\rho '}_{{\tau }}}(u)= \tau - I(u < 0),\) we can write

$$\begin{aligned} \begin{array}{l} {n^{ - 1}}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}{\psi _{{\tau _k}}}({\varepsilon _i} - {c_{{\tau _k}}} - {R_{ni}} - {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))} } \\ = - {n^{ - {1 / 2}}}{\varvec{H}_{n}} + \sum \limits _{k = 1}^q {{\omega _k}(B_{n21}^{(k)} + B_{n22}^{(k)})} , \\ \end{array} \end{aligned}$$
(30)

where \({\varvec{H}_n} = {n^{ - {1 / 2}}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}\sum \limits _{k = 1}^q {{\omega _k}} [I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}}) - {\tau _k}],} \)

$$\begin{aligned} \begin{array}{l} B_{n21}^{(k)} = {n^{ - 1}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}[F({c_{{\tau _k}}} + {R_{ni}}) - F({R_{ni}} + {c_{{\tau _k}}} + {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))],} \\ B_{n22}^{(k)} = {n^{ - 1}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}\left\{ {[I({\varepsilon _i} < {c_{{\tau _k}}} + {R_{ni}}) - I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}} + {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))]} \right. } \\ \qquad \qquad \left. -{[F({c_{{\tau _k}}} + {R_{ni}}) - F({R_{ni}} + {c_{{\tau _k}}} + {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))]} \right\} .\\ \end{array} \end{aligned}$$

Taking Taylor’s explanation for \({F({R_{ni}} + {c_{{\tau _k}}} + {\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k}))}\) at \({{R_{ni}} + {c_{{\tau _k}}}}\) gives

$$\begin{aligned} \begin{aligned} B_{n21}^{(k)}&= - {n^{ - 1}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}[f({c_{{\tau _k}}} + {R_{ni}}){\alpha _n}(\varvec{\Pi }_{ia}^T{\varvec{\hat{u}}_{na}} + {{\hat{v}}_k})](1 + o(1)),} \\&= - {\alpha _n}f({c_{{\tau _k}}})({\varvec{\Gamma }_{na}}{\varvec{\hat{u}}_{na}} + {\varvec{\mu }_{na}}{{\hat{v}}_k})(1 + {o_p}(1)). \\ \end{aligned} \end{aligned}$$

By direct calculation of the mean and variance, we can show, as in Jiang et al. (2001), \(B_{n22}^{(k)} = {o_p}({\alpha _n})\). This combined with (28) and (30) lead to

$$\begin{aligned} - ({n^{ - {1 / 2}}}{\varvec{H}_{n}} + {\varvec{c}_n}) = {\alpha _n}\left\{ \sum \limits _{k = 1}^q {{\omega _k}f({c_{{\tau _k}}})({\varvec{\Gamma }_{na}}{\varvec{\hat{u}}_{na}} + {\varvec{\mu }_{na}}{{\hat{v}}_k}) + {\varvec{\Sigma }_{{\lambda _n}}}{\varvec{\hat{u}}_{na}}}\right\} (1 + {o_p}(1)).\nonumber \\ \end{aligned}$$
(31)

Similarly, (29) can be simplified as

$$\begin{aligned} {n^{ - 1/2}}\zeta _{n,k}+ {\alpha _n}{\omega _k}f({c_{{\tau _k}}})({{\hat{v}}_k} + \varvec{\mu }_{na}^T{\varvec{\hat{u}}_{na}}(1 + {o_p}(1))) = 0, \end{aligned}$$
(32)

where \( \zeta _{n,k} = {n^{{{ - 1} / 2}}}{\omega _k}\sum \limits _{i = 1}^n {[I({\varepsilon _i} < {c_{{\tau _k}}}+R_{ni}) - {\tau _k}]} \). Solving (31) and (32), we obtain that

$$\begin{aligned} {\alpha _n}\left( {{\varvec{G}_{na}} + \frac{{{\varvec{\Sigma }_{{\lambda _n}}}}}{{{\varvec{\omega }^T}\varvec{f}}}} \right) {\varvec{\hat{u}}_{na}} + \frac{{{\varvec{c}_n}}}{{{\varvec{\omega }^T}\varvec{f}}} = - {n^{ - {1 / 2}}}\left( {\frac{{{\varvec{H}_{n}}}}{{{\varvec{\omega }^T}\varvec{f}}} - {\varvec{\mu }_{na}}\sum \limits _{k = 1}^q {\frac{{{\zeta _{n,k}}}}{{{\varvec{\omega }^T}\varvec{f}}}} } \right) + {o_p}({n^{{{ - 1} / 2}}}). \end{aligned}$$

Let \(\varvec{H}_n^* = {n^{ - {1 / 2}}}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}\sum \limits _{k = 1}^q {{\omega _k}} [I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k}]}\), \( \zeta _{n,k}^* = {n^{{{ - 1} / 2}}}{\omega _k}\sum \limits _{i = 1}^n [I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k}] \). Following Jiang et al. (2012), we have

$$\begin{aligned} \varvec{e}^T\varvec{G}_{na}^{{{ - 1} / 2}}{{(\varvec{H}_n^* - {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}^*} )} / {{\varvec{\omega }^T}\varvec{f}}}\mathop \rightarrow \limits ^d N(0,R(q)). \end{aligned}$$
(33)

Put \({\eta _{i,k}} = I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}}) - {\tau _k},\eta _{i,k}^* = I({\varepsilon _i} < {c_{{\tau _k}}}) - {\tau _k},\) Moreover,

$$\begin{aligned} \begin{array}{l} Var\left( {(\varvec{H}_n - {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}} ) - (\varvec{H}_n^* - {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}^*} )\left| \mathscr {H} \right. } \right) \\ \quad = Var\left( {[{n^{ - {1 / 2}}}\sum \limits _{i = 1}^n {\sum \limits _{k = 1}^q {{\varvec{\Pi }_{ia}}{\omega _k}({\eta _{i,k}} - \eta _{i,k}^*) - {n^{ - {1 / 2}}}{\varvec{\mu }_{na}}\sum \limits _{i = 1}^n {\sum \limits _{k = 1}^q {{\omega _k}({\eta _{i,k}} - \eta _{i,k}^*)} } } } ]\left| \mathscr {H} \right. } \right) \\ \quad \le 2\left\{ {\frac{1}{n}\sum \limits _{i = 1}^n ({{\varvec{\Pi }_{ia}}\varvec{\Pi }_{ia}^T + {\varvec{\mu }_{na}}\varvec{\mu }_{na}^T}) } \right\} Var\left( {\sum \limits _{k = 1}^q {{\omega _k}({\eta _{i,k}} - \eta _{i,k}^*)} \left| \mathscr {H} \right. } \right) \\ \quad \le 2{q^2}\left\{ {\frac{1}{n}\sum \limits _{i = 1}^n ({{\varvec{\Pi }_{ia}}\varvec{\Pi }_{ia}^T + {\varvec{\mu }_{na}}\varvec{\mu }_{na}^T})} \right\} \mathop {\max }\limits _k \left| E \{{\omega _k^2[I({\varepsilon _i} < {R_{ni}} + {c_{{\tau _k}}}) - I({\varepsilon _i} < {c_{{\tau _k}}})]\left| \mathscr {H} \right. }\} \right| \\ \quad \le 2{q^2}\left\{ {\frac{1}{n}\sum \limits _{i = 1}^n ({{\varvec{\Pi }_{ia}}\varvec{\Pi }_{ia}^T + {\varvec{\mu }_{na}}\varvec{\mu }_{na}^T}) } \right\} \mathop {\max }\limits _k \omega _k^2|F({R_{ni}} + {c_{{\tau _k}}}) - F({c_{{\tau _k}}})| \\ \quad = {o}(1). \\ \end{array} \end{aligned}$$

Thus, we have

$$\begin{aligned} \varvec{H}_n- {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}} \mathop \rightarrow \limits ^p \varvec{H}_n^* - {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}^*} . \end{aligned}$$
(34)

By Slutsky’s theorem, conditioning on \(\mathscr {H}\), we have

$$\begin{aligned} \varvec{e}^T\varvec{G}_{na}^{{{ - 1} / 2}}{{(\varvec{H}_n - {\varvec{\mu }_{na}}\sum \nolimits _{k = 1}^q {\zeta _{n,k}} )} / {{\varvec{\omega }^T}\varvec{f}}}\mathop \rightarrow \limits ^d N(0,R(q)). \end{aligned}$$
(35)

Note that \({\varvec{u}_{na}} = \alpha _n^{ - 1}(\varvec{\gamma }_a - {\varvec{\gamma }_ a^0})\). It follows that

$$\begin{aligned} \sqrt{n} \varvec{e}^T\varvec{G}_{na}^{{{ - 1} / 2}}({\varvec{G}_{na}} + \frac{{{\varvec{\Sigma }_{{\lambda _n}}}}}{{{\varvec{\omega }^T}\varvec{f}}})\left[ {({\varvec{\hat{\gamma }}_a} - \varvec{\gamma }_a^0) + {{({\varvec{G}_{na}} + \frac{{{\varvec{\Sigma }_{{\lambda _n}}}}}{{{\varvec{\omega }^T}\varvec{f}}})}^{ - 1}}\frac{{{\varvec{c}_n}}}{{{\varvec{\omega }^T}\varvec{f}}}} \right] \mathop \rightarrow \limits ^d N(0,R(q)). \end{aligned}$$

\(\square \)

Proof of Theorem 2 (c)

By the proof Theorem 2 (a), we know immediately that \({\varvec{\hat{\gamma }} _{b,{\lambda _n}}} = 0\) with probability tending to one. Consequently, we know that \({\varvec{\hat{\gamma }}_{a,{\lambda _n}}}\) must be the solution of the following normal equation

$$\begin{aligned} \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}{\psi _{{\tau _k}}}\left( {{Y_i} - {\hat{c}_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }}}_a}} \right) } } - \varvec{c}_n=\varvec{0}, \end{aligned}$$

On the other hand, the oracle estimator must be the solution of the normal equation

$$\begin{aligned} \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi }_{ia}}{\psi _{{\tau _k}}}\left( {{Y_i} - {\hat{c}_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }} }_{ora}}} \right) } } = \varvec{0}. \end{aligned}$$

So we have

$$\begin{aligned} \begin{array}{l} \left[ \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}{\psi _{{\tau _k}}}\left( {{Y_i} -{\hat{c}_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }} }_{ora}}} \right) } }\right. \\ \left. \quad - \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}{\psi _{{\tau _k}}}\left( {{Y_i} - {\hat{c}_{{\tau _k}}}- \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }} }_a}} \right) } } \right] \\ \quad + {\left( {p'_{{\lambda _n}}}\left( {\left\| {{{\varvec{\hat{\gamma }}}_1}} \right\| _2}\right) \frac{{\varvec{\hat{\gamma }}_1^T}}{{{{\left\| {{{\varvec{\hat{\gamma }} }_1}} \right\| }_2}}},\ldots ,{p'_{{\lambda _n}}}\left( {\left\| {{{\varvec{\hat{\gamma }} }_{{s}}}} \right\| _2}\right) \frac{{\varvec{\hat{\gamma }} _{{s}}^T}}{{{{\left\| {{{\varvec{\hat{\gamma }}}_{{s}}}} \right\| }_2}}}\right) ^T} = \varvec{0}.\\ \end{array} \end{aligned}$$
(36)

Furthermore, the first term of the left hand side of (36) can be written as

$$\begin{aligned} \begin{array}{l} \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}} } \left[ {{\psi _{{\tau _k}}}\left( {{Y_i} - {{\hat{c}}_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }} }_{ora}}} \right) - {\psi _{{\tau _k}}}\left( {{Y_i} - {c_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{\varvec{\gamma } ^0}} \right) } \right] \\ - \frac{1}{n}\sum \limits _{k = 1}^q {{\omega _k}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}} } \left[ {{\psi _{{\tau _k}}}\left( {{Y_i} - {{\hat{c}}_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{{\varvec{\hat{\gamma }} }_a}} \right) - {\psi _{{\tau _k}}}\left( {{Y_i} - {c_{{\tau _k}}} - \varvec{\Pi }_{ia}^T{\varvec{\gamma } ^0}} \right) } \right] \\ \buildrel \Delta \over = {\varvec{G}_1}{{ - }}{\varvec{G}_2} .\\ \end{array} \end{aligned}$$

For \(\varvec{G}_1\) and \(\varvec{G}_2\), after some direct calculation, we have

$$\begin{aligned} {\varvec{G}_1} = \sum \limits _{k = 1}^q {{\omega _k}} {\varvec{\hat{S}}_{na}}({\varvec{\gamma } ^0} - {\varvec{\hat{\gamma }} _{ora}}) + {o_p}({n^{{{ - r} / {(2r + 1)}}}}) ,\\ {\varvec{G}_2} = \sum \limits _{k = 1}^q {{\omega _k}} {\varvec{\hat{S}}_{na}}({\varvec{\gamma }^0} - {\varvec{\hat{\gamma }} _a}) + {o_p}({n^{{{ - r} / {(2r + 1)}}}}) , \end{aligned}$$

where \({\varvec{\hat{S}}_{na}} = {n^{ - 1}}f{{(}}{c_{{\tau _k}}}{{)}}\sum \limits _{i = 1}^n {{\varvec{\Pi } _{ia}}\varvec{\Pi } _{ia}^T} .\) So

$$\begin{aligned} \begin{array}{l} \mathop {\sup }\limits _{u \in [0,1]} {\left\| {{{\varvec{\hat{\gamma }}}_{aj}} - {{\varvec{\hat{\gamma }} }_{oraj}}} \right\| _2} \\ = \mathop {\sup }\limits _{u \in [0,1]}{\left\| {{{(\sum \limits _{k = 1}^q {{\omega _k}} {{\varvec{\hat{S}}}_{naj}})}^{ - 1}}{{({{p'}_{{\lambda _n}}}({{\left\| {{{\varvec{\hat{\gamma }}}_j}} \right\| }_2})\frac{{\varvec{\gamma }_j}}{{{{\left\| {{{\varvec{\hat{\gamma }}}_j}} \right\| }_2}}})}}} \right\| _2} + {o_p}({n^{{{ - r} / {(2r + 1)}}}}) \\ \le \hat{\lambda }_{\min j }^{ - 1}{a_n} +{o_p}({n^{{{ - r} / {(2r + 1)}}}}) \\ ={o_p}({n^{{{ - r} / {(2r + 1)}}}}),\\ \end{array} \end{aligned}$$

where \({\hat{\lambda }_{\min ,j }} = {\inf _u}{\lambda _{\min }}({\varvec{\hat{S}}_{na,j}}).\)

Thus, we have \(\mathop {\sup }\limits _{u \in [0,1]}| {{{{\hat{\beta }}}_{aj}}(u) - {{{\hat{\beta }} }_{ora,j}}(u)}| ^2 = {o_p}({n^{{{ - 2r} / {(2r + 1)}}}})\). This completes the proof. \(\square \)

Proof of Theorem 3

Suppose for some \(s_1 + 1 \le {j_0} \le s\), \({\varvec{\pi } ^{({j_0})T}}{\varvec{\hat{\gamma }} _{{j_0}}}\) does not represent a constant coefficient. Let \({\varvec{\hat{\gamma }} ^*}\) be the vector obtained from \(\varvec{\hat{\gamma }} \) with \({\varvec{\hat{\gamma }} _{{j_0}}}\) being replaced by its projection onto the subspace \(\left\{ {{\varvec{\gamma } _{{j_0}}}{{:}}{\varvec{\pi } ^{({j_0})T}}{\varvec{\gamma } _{{j_0}}}\mathrm{{ represents ~ a ~constant~ coefficient}}} \right\} \). For \(j=s_1+1,\ldots ,s\), \({{{\varvec{\hat{\gamma }} }_j}^T{{{\varvec{F}}}_j}{{\varvec{\hat{\gamma }} }_j} = 0}\). By definition of \({\varvec{\hat{\gamma }}}\) and \({\varvec{\hat{\gamma }}^* }\), we have

(37)

Since \({n^{r / {(2r + 1)}}} {\lambda _n} / {\sqrt{{p_n}} } \rightarrow \infty \), so \({\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } ={O_p}( {{n^{{{ - r} / {(2r + 1)}}}}}) = o({\lambda _n})\) and \(n{p_{{\lambda _n}}}\left\{ {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } \right\} =n\lambda _n{\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} }\) with probability tending to 1, by the definition of SCAD penalty function. By the proof of Theorem 2 (a), we have \({I} ={O_p}( {n\sqrt{{p_n}} {n^{{{ - r} / {(2r + 1)}}}}} )\left\| {\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*}} \right\| \). Noting that \(\left\| {\varvec{\hat{\gamma }} - {{\varvec{\hat{\gamma }} }^*}} \right\| =\left\| {\varvec{\hat{\gamma }}_{j_0} - {{\varvec{\hat{\gamma }}_{j_0} }^*}} \right\| =O_p( {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } ) \). We can conclude that \(n{p_{{\lambda _n}}}\left\{ {\sqrt{{{\varvec{\hat{\gamma }} }_{{j_0}}}^T{{{\varvec{F}}}_{{j_0}}}{{\varvec{\hat{\gamma }} }_{{j_0}}}} } \right\} \) dominates the other term in (37), which contradicts to \(PL(\varvec{\hat{\gamma }},\varvec{\hat{c}},\varvec{\omega }) - PL({\varvec{\hat{\gamma }} ^*},\varvec{\hat{c}},\varvec{\omega }) \le 0.\) \(\square \)

Proof of Theorem 4

We note that because of Theorem 1–Theorem 3, we only need to consider a correctly specified PLVCM without regularization terms. This reduces the problem to the one studied in Theorem 4.1 of Zou and Yuan (2008) and the results there directly apply, showing the asymptotic normality of the parametric component.

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Yang, H. & Lv, J. Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression. Stat Papers 58, 1009–1033 (2017). https://doi.org/10.1007/s00362-015-0736-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0736-5

Keywords

Navigation