Skip to main content
Log in

Penalized expectile regression: an alternative to penalized quantile regression

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

This paper concerns the study of the entire conditional distribution of a response given predictors in a heterogeneous regression setting. A common approach to address heterogeneous data is quantile regression, which utilizes the minimization of the \(L_1\) norm. As an alternative to quantile regression, we consider expectile regression, which relies on the minimization of the asymmetric \(L_2\) norm and detects heteroscedasticity effectively. We assume that only a small set of predictors is relevant to the response and develop penalized expectile regression with SCAD and adaptive LASSO penalties. With properly chosen tuning parameters, we show that the proposed estimators display oracle properties. A numerical study using simulated and real examples demonstrates the competitive performance of the proposed penalized expectile regression, and its combined use with penalized quantile regression would be helpful and recommended for practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Aigner, D., Amemiya, T., Poirier, D. (1976). On the estimation of production frontiers: Maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review, 17, 377–396.

  • Belloni, A., Chernozhukov, V. (2011). \(\ell _1\)-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.

  • Belloni, A., Chernozhukov, V., Kato, K. (2015). Uniform post-selection inference for least absolute deviation regression and other z-estimation problems. Biometrika, 102, 77–94.

  • Chatterjee, A., Lahiri, S. N. (2010). Asymptotic properties of the residual bootstrap for Lasso estimators. Proceedings of the American Mathematical Society, 138, 4497–4509.

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

  • Friberg, H. A. (2014) Rmosek: The r-to-mosek optimization interface. r package version 1.2.5.1.

  • Geyer, C. J. (1994). On the asymptotics of constrained M-estimation. The Annals of Statistics, 22, 1993–2010.

    Article  MathSciNet  MATH  Google Scholar 

  • Gu, Y., Zou, H. (2016). High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics, 44, 2661–2694.

  • Harrison, D., Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Environmental Economics and Management, 5, 81–102.

  • Javanmard, A., Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15, 2869–2909.

  • Jones, M. C. (1994). Expectiles and m-quantiles are quantiles. Statistics & Probability Letters, 20, 149–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, Y., Choi, H., Oh, H. S. (2008). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103, 1665–1673.

  • Knight, K., Fu, W. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28, 1356–1378.

  • Kocherginsky, M., He, X., Mu, Y. (2005). Practical confidence intervals for regression quantiles. Journal of Computational and Graphical Statistics, 14(1), 41–55.

  • Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

  • Koenker, R., Mizera, I. (2014). Convex optimization in R. Journal of Statistical Software, 60, 1–23.

  • Kuan, C. M., Yeh, J. H., Hsu, Y. C. (2009). Assessing value at risk with care, the conditional autoregressive expectile models. Journal of Econometrics, 150, 261–270.

  • Li, Y., Zhu, J. (2008). \(l_1\)-norm quantile regression. Journal of Computational and Graphical Statistics, 17, 1–23.

  • Lockhart, R., Taylor, J., Tibshirani, R. J., Tibshirani, R. (2014). A significance test for the Lasso. The Annals of Statistics, 42, 413–468.

  • Minnier, J., Tian, T., Cai, T. (2011). A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association, 106, 1371–1382.

  • MOSEK ApS D. (2011). The mosek optimization tool manual. version 7.0. https://www.mosek.com.

  • Newey, W. K., Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica, 55, 819–847.

  • Schnabel, S. K., Eilers, P. H. C. (2009). Optimal expectile smoothing. Computational Statistics and Data Analysis, 53, 4168–4177.

  • Sobotka, F., Radice, R., Marra, G., Kneib, T. (2013a). Estimating the relationship between women’s education and fertility in Bostwana by using and instumental variable approach to semiparametric expectile regression. Journal of the Royal Statistical Society, Series B, 62, 25–45.

  • Sobotka, F., Radice, R., Marra, G., Kneib, T. (2013b). On confidence intervals for semiparametric expectile regression. Statistics and Computing, 23, 135–148.

  • Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-hign dimension. Journal of the American Statistical Association, 107, 214–222.

  • Wu, Y., Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica, 19, 801–817.

  • Yuille, A. L., Rangarajan, A. (2003). The concave–convex procedure. Neural Computation, 15, 915–936.

  • Zhang, C. H., Zhang, S. S. (2014). Confidence intervals for low-dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76, 217–242.

  • Ziegel, J. (2014). Coherence and elicitability. Mathematical Finance, 26, 901–918.

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1533.

Download references

Acknowledgements

This work is part of the first author’s dissertation. The third author’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2007611).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hosik Choi.

Appendix: Proofs of theorems

Appendix: Proofs of theorems

1.1 Proof of Theorem 1

Following Wu and Liu (2009), it is sufficient to show that for any given \(\delta > 0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \displaystyle \inf _{\Vert \mathbf {u} \Vert = C} R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) > R_{n}({\varvec{\beta }}_0) \right\} \ge 1 - \delta . \end{aligned}$$
(10)

It implies that there exists a local minimizer satisfying \(\Vert \hat{{\varvec{\beta }}} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})\). Now consider

$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 - \mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\qquad +\, n \displaystyle \sum _{j = 1}^p \Big ( p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )\Big ). \end{aligned}$$

Because \(p'_{\lambda _n}(\theta ) = \lambda _n \{I(\theta \le \lambda _n ) + \frac{(a\lambda _n - \theta )_{+}}{(a-1)\lambda _n}I(\theta > \lambda _n ) \} \ge 0\) for some \(a > 2\) and \(\theta > 0\), and \(p_{\lambda _n}(0) = 0\),

$$\begin{aligned} n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) = n (p_{\lambda _n}(\vert u_j/\sqrt{n} \vert ) - p_{\lambda _n}(0)) \ge 0 \end{aligned}$$

for \(j = q+1, \ldots , p\). Hence,

$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad \ge \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \nonumber \\&\qquad +\, n\sum _{j = 1}^q \Big (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \nonumber \end{aligned}$$
(11)

We first consider the second term on the right-hand side of (11). For \(j = 1, \ldots , q\),

$$\begin{aligned}&n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) \\&\quad = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{u_j}{\sqrt{n}} + \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{u_j}{\sqrt{n}} \Big )^2 + o \Big (\frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{n} \Big ) \Big )\\&\quad = O \Big ( \sqrt{n} \max _{1 \le j \le q} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) + \max _{1 \le j \le q} p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \end{aligned}$$

For large n,

$$\begin{aligned} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} \lambda _n \Big (I(\vert \beta _{j0} \vert \le \lambda _n ) + \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{(a-1)\lambda _n}I(\vert \beta _{j0} \vert > \lambda _n ) \Big ) \\= & {} \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{a-1} \rightarrow 0 \text{ as } \lambda _n \rightarrow 0, \\ p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} -\frac{1}{a-1} I(\lambda _n< \vert \beta _{j0} \vert < a \lambda _n ) \rightarrow 0 \text{ as } \lambda _n \rightarrow 0. \end{aligned}$$

Denote the first and second derivatives of \(\rho _{\tau }(\epsilon _i - t)\) at \(t = 0\) as follows:

$$\begin{aligned} g_\tau (\epsilon _i)= & {} \rho ^{'}_{\tau }(\epsilon _i - t)\mid _{t=0} = -2\tau \epsilon _i I(\epsilon _i \ge 0) - 2(1 - \tau )\epsilon _i I(\epsilon _i< 0),\\ h_\tau (\epsilon _i)= & {} \rho ^{''}_{\tau }(\epsilon _i - t)\mid _{t=0} = 2\tau I(\epsilon _i \ge 0) + 2(1 - \tau )I(\epsilon _i < 0). \end{aligned}$$

Then \(\mathrm {E}(g_\tau (\epsilon _i)) = 0\). Denote \({\mathrm {Var}}(g_\tau (\epsilon _i)) = \sigma _{g_\tau }^2\), \(\mathrm {E}(h_\tau (\epsilon _i)) = \mu _{h_\tau } > 0\) and \({\mathrm {Var}}(h_\tau (\epsilon _i)) = \sigma _{h_\tau }^2, i = 1, \ldots , n\). According to model (3.1), \(\epsilon _i = y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0, i = 1, \ldots , n\). Now we consider the first term on the right-hand side of (11):

$$\begin{aligned}&\displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} + \frac{h_\tau (\epsilon _i)}{2} \Big (\frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ). \end{aligned}$$

We note that

$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}= & {} \mathrm {E}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}\right) + O_p \left( \sqrt{{\mathrm {Var}}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) } \right) \\= & {} O_p \left( \sqrt{\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \sigma _{g_\tau }^2} \right) , \end{aligned}$$

and

$$\begin{aligned} \displaystyle \sum _{i = 1}^n \frac{h_\tau (\epsilon _i)}{2} \left( \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) ^2= & {} \frac{\mu _{h_{\tau }}}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + O_p \left( \sqrt{ \frac{1}{4} \sum _{i = 1}^n \Big (\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \Big )^2 \sigma _{h_\tau }^2} \right) \\= & {} \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + o_p(1). \end{aligned}$$

Therefore, \(R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0)\) is dominated by \( \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u}\), for \(\Vert \mathbf {u} \Vert = C\), where C is sufficiently large. In conclusion, there exists a local minimizer of \(R_{n}({\varvec{\beta }})\), \(\hat{{\varvec{\beta }}}^{(\mathrm{SCAD})}\), such that \(\Vert \hat{{\varvec{\beta }}}^{(\mathrm{SCAD})} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})\), if \(\lambda _n \rightarrow 0\) as \(n \rightarrow \infty \). \(\square \)

1.2 Proof of Theorem 2

(a) For any \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}\),

$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) - \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1 - \mathbf {x}_{i2}^{\text{ T }}{\varvec{\beta }}_2) \Big ) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ). \end{aligned}$$
(12)

By Condition 2 and following the proof of Theorem 1,

$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1)\\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2\\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1) = O_p(1). \end{aligned}$$

Now we consider the last term on the right-hand side of (12). For \(j = q+1, \ldots , p\),

$$\begin{aligned} p_{\lambda _n}(\vert \beta _j \vert )= & {} \displaystyle \lim _{\theta \rightarrow 0^{+}} p_{\lambda _n}(\theta ) + \displaystyle \lim _{\theta \rightarrow 0^{+}} p'_{\lambda _n}(\theta ) \vert \beta _j \vert + o(\vert \beta _j \vert )\\= & {} \lambda _n \vert \beta _j \vert + o(\vert \beta _j \vert ). \end{aligned}$$

Therefore, \(n \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ) = n \lambda _n \Big (\sum _{j = q+1}^p \Big (\vert \beta _j \vert + o(\vert \beta _j \vert /\lambda _n)\Big ) \Big )\). Because \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(o(\vert \beta _j \vert /\lambda _n) = o \Big (\displaystyle \frac{1}{\sqrt{n}\lambda _n}\Big )\). We note that \(\sqrt{n} \lambda _n \rightarrow \infty \), \(n \lambda _n \rightarrow \infty \) as \(n \rightarrow \infty \) and \(R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }})\) is dominated by

$$\begin{aligned} - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ). \end{aligned}$$

Consequently,

$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \rightarrow -\infty \text{ as } n \rightarrow \infty . \end{aligned}$$

This completes the proof of part(a) of the theorem. \(\square \)

(b) From Theorem 1 and part(a), we know \(\hat{{\varvec{\beta }}}_1\) is a root-n consistent local minimizer of \(R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})\). Let \({\varvec{\theta }}_1 = \sqrt{n} ({\varvec{\beta }}_1 - {\varvec{\beta }}_{10})\), i.e., \({\varvec{\beta }}_1 = {\varvec{\beta }}_{10} + {\varvec{\theta }}_1/\sqrt{n}\). Then

$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _j \vert )\\= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n}) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) \\\triangleq & {} Q_n({\varvec{\theta }}_1). \end{aligned}$$

Because \(\hat{{\varvec{\theta }}}_1 = \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10})\) is a local minimizer of \(Q_n({\varvec{\theta }}_1),\)

$$\begin{aligned} \displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0, \end{aligned}$$

for \(j = 1, \ldots , q\). Now we decompose the derivative of \(Q_n({\varvec{\theta }}_1)\) by parts:

$$\begin{aligned} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} \rho _{\tau }(\epsilon _i) + g_\tau (\epsilon _i) \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )\\&+ \frac{h_\tau (\epsilon _i)}{2} \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )^2 + o(1), \nonumber \\ \frac{\partial }{\partial \theta _j} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} - g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{n}x_{ij}, \nonumber \\ p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert )= & {} p_{\lambda _n}(\vert \beta _{j0} \vert ) + p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{\theta _j}{\sqrt{n}}\\&+\, \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{\theta _j}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n} \Big ). \end{aligned}$$

Therefore, as \(n \rightarrow \infty ,\)

$$\begin{aligned} n \frac{\partial }{\partial \theta _j} p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\theta _j}{n} \Big ) \rightarrow 0. \end{aligned}$$
(13)

From the proof of Theorem 1, (13) holds. Plugging them in \(\displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0\), for \(j = 1, \ldots , q\), we have

$$\begin{aligned} 0= & {} \sum _{i = 1}^n \Big (- g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big ) \\&+ \sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij}= & {} \sum _{i=1}^n \Big (g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}}-\frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big )\\&-\sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i=1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1 - \sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}),\\ \hat{{\varvec{\theta }}}_1= & {} \left( \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \right) ^{-1} \left( \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1\right. \\&\left. -\sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \right) , \end{aligned}$$

where \(\mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10})\) is defined as a q-dimensional vector with the \(j\mathrm{th}\) element \(n \big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \big )\). According to (13) and Condition 2, as \(n \rightarrow \infty \), \(\mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \mu _{h_\tau } \Sigma _{11},\)\(\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow 0, \text{ and } \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \rightarrow 0.\) In addition, \(\mathrm {E}\Big (g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = {\varvec{0}}, i = 1, \ldots , n,\)

$$\begin{aligned}&{\mathrm {Var}}\Big (\sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = \sigma _{g_\tau }^2 \frac{\sum _{i = 1}^n \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \sigma _{g_\tau }^2 \Sigma _{11},\\&\sum _{i = 1}^n \mathrm {E}\left( \Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^2 I\Big (\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert > \xi \Big ) \right) \le \displaystyle \sum _{i = 1}^n \frac{\mathrm {E}\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^4}{\xi ^2}\\&\quad = \frac{1}{\xi ^2} \mathrm {E}\Big (g_\tau ^4(\epsilon _i)\Big ) \displaystyle \sum _{i = 1}^n \left( \frac{\mathbf {x}_{i1}^{\text{ T }}\mathbf {x}_{i1}}{n} \right) ^2 \rightarrow 0, \end{aligned}$$

for any \(\xi > 0\). Applying Lindeberg–Feller CLT, we have

$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w_1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11}). \end{aligned}$$

By Slutsky’s theorem, \(\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\mu _{h_\tau } \Sigma _{11} \Big )^{-1} \mathbf {w_1}.\) Then, we can conclude,

$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10}) \xrightarrow []{\mathcal { L }} N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}). \end{aligned}$$

This completes the proof. \(\square \)

1.3 Proof of Theorem 3

We first prove the asymptotic normality in part (b). Let \({\varvec{\theta }} = \sqrt{n} ({\varvec{\beta }} - {\varvec{\beta }}_{0})\). Then, we have

$$\begin{aligned} V_n({\varvec{\theta }})\triangleq & {} R_{n}({\varvec{\beta }}_0 + {\varvec{\theta }}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \nonumber \\= & {} \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{h_\tau (\epsilon _i)}{2} \Big ( -\frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ) \nonumber \\&+\, n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) \nonumber \\= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{\mu _{h_\tau }}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} {\varvec{\theta }} + \frac{1}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \Big ( \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \Big ) {\varvec{\theta }} \nonumber \\&+\, o_p(1) + n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ). \end{aligned}$$
(14)

From the proof of Theorem 2,

$$\begin{aligned}&\displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma ), \\&\frac{\mu _{h_\tau }}{2} \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow \frac{\mu _{h_\tau }}{2} \Sigma , \\&\frac{1}{2} \displaystyle \sum _{i = 1}^n \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow 0. \end{aligned}$$

Now we consider the last term of (14). For \(1 \le j \le q\),

$$\begin{aligned} w_j \xrightarrow []{\mathcal { P }} \vert \beta _{j0} \vert ^{-\gamma }, \sqrt{n} \big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \rightarrow \theta _j \text{ sgn }(\beta _{j0}). \end{aligned}$$

By Slutsky’s theorem, \(n \lambda _n \sum _{j = 1}^p w_j\big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \xrightarrow []{\mathcal { P }} 0\) because \(\sqrt{n} \lambda _n \rightarrow 0\) as \(n \rightarrow \infty \). For \(q+1 \le j \le p\), \(\beta _{j0} = 0\) and

$$\begin{aligned} \sqrt{n} (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) = \vert \theta _j \vert , \sqrt{n} \lambda _n w_j = \lambda _n n^{(\gamma +1)/2} (\vert \sqrt{n} \tilde{\beta }_j \vert )^{-\gamma }, \end{aligned}$$

where \(\tilde{\beta }_j\) is the \(j\mathrm{th}\) element of \(\tilde{{\varvec{\beta }}}\) defined in (3.5) and \(\sqrt{n} \tilde{\beta }_j = O_p(1)\). Therefore

$$\begin{aligned} n \lambda _n \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) {\left\{ \begin{array}{ll} \xrightarrow []{\mathcal { P }} \infty &{}\text{ if } \theta _j \ne 0,\\ = 0 &{} \text{ if } \theta _j = 0. \end{array}\right. } \end{aligned}$$

Applying Slutsky’s theorem again, we have \(V_n({\varvec{\theta }}) \xrightarrow []{\mathcal { L }} V({\varvec{\theta }})\) for every \({\varvec{\theta }}\). Here,

$$\begin{aligned} V({\varvec{\theta }}) = {\left\{ \begin{array}{ll} \displaystyle \frac{\mu _{h_\tau }}{2} {{\varvec{\theta }}_1}^{\text{ T }}\Sigma _{11} {\varvec{\theta }}_1 + \mathbf {w}_1^{\text{ T }}{\varvec{\theta }}_1 &{}\quad \text{ if } \theta _{j} = 0, q+1 \le j \le p, \\ \infty &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$

where \(\mathbf {w}_1 = (w_1, w_2, \ldots , w_q)^{\text{ T }}\sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11})\) and \( {\varvec{\theta }}_1 = (\theta _1, \theta _2, \ldots , \theta _q)^{\text{ T }}\). We note that \(V_n({\varvec{\theta }})\) is convex and the unique minimum of \(V({\varvec{\theta }})\) is

$$\begin{aligned} ((-(\mu _{h_\tau } \Sigma _{11})^{-1} \mathbf {w}_{1})^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}. \end{aligned}$$

With the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we have

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\beta }}}_1^{(\mathrm{AL})} - {\varvec{\beta }}_{10}) = \hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} -(b \Sigma _{11})^{-1} \mathbf {w}_{1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}) \end{aligned}$$

and

$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} - {\varvec{\beta }}_{20}) = \hat{{\varvec{\theta }}}_2 \xrightarrow []{\mathcal { L }} {\varvec{0}} \end{aligned}$$

where \(\hat{{\varvec{\theta }}}_2 = (\hat{\theta }_{q+1}, \hat{\theta }_{q+2}, \ldots , \hat{\theta }_p)^{\text{ T }}, \) which proves the asymptotic normality property.

Next, we show the sparsity property. For any \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}\), following the proof of Theorem 2, we have

$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o_p \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \lambda _n \displaystyle \sum _{j = q+1}^p w_j (\vert \beta _{j} \vert ). \end{aligned}$$
(15)

The first two terms are bounded in the same way as the proof of Theorem 2:

$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 \\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \\&\quad =\frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1)= O_p(1). \end{aligned}$$

For the third term on the right-hand side of (15),

$$\begin{aligned} n \lambda _n \displaystyle \sum _{j = q+1}^p w_j \vert \beta _j \vert = n^{(\gamma +1)/2} \lambda _n \sqrt{n} \displaystyle \sum _{j = q+1}^p (\sqrt{n} \tilde{\beta }_j)^{-\gamma } \vert \beta _j \vert \rightarrow \infty , \end{aligned}$$

because \(\sqrt{n} \tilde{\beta }_j = O_p(1)\) and \(n^{(\gamma +1)/2} \lambda _n \rightarrow \infty \). Therefore,

$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \rightarrow -\infty \; \text{ as } \; n \rightarrow \infty . \end{aligned}$$

This implies \(\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} = {\varvec{0}}\). \(\square \)

1.4 Proof of Corollary 1

From the proof of Theorem 2, it can be shown that

$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w_1} \sim N({\varvec{0}}, \Sigma _{11}^g), \end{aligned}$$

and \(\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\Sigma _{11}^h \Big )^{-1} \mathbf {w_1}\). \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, L., Park, C. & Choi, H. Penalized expectile regression: an alternative to penalized quantile regression. Ann Inst Stat Math 71, 409–438 (2019). https://doi.org/10.1007/s10463-018-0645-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0645-1

Keywords

Navigation