Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

A coordinate descent algorithm for computing penalized smooth quantile regression

  • 710 Accesses

  • 2 Citations

Abstract

The computation of penalized quantile regression estimates is often computationally intensive in high dimensions. In this paper we propose a coordinate descent algorithm for computing the penalized smooth quantile regression (cdaSQR) with convex and nonconvex penalties. The cdaSQR approach is based on the approximation of the objective check function, which is not differentiable at zero, by a modified check function which is differentiable at zero. Then, using the maximization-minimization trick of the gcdnet algorithm (Yang and Zou in, J Comput Graph Stat 22(2):396–415, 2013), we update each coefficient simply and efficiently. In our implementation, we consider the convex penalties \(\ell _1+\ell _2\) and the nonconvex penalties SCAD (or MCP) \(+ \ell _2\). We establishe the convergence property of the csdSQR with \(\ell _1+\ell _2\) penalty. The numerical results show that our implementation is an order of magnitude faster than its competitors. Using simulations we compare the speed of our algorithm to its competitors. Finally, the performance of our algorithm is illustrated on three real data sets from diabetes, leukemia and Bardet–Bidel syndrome gene expression studies.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aravkin, A., Kambadur, A., Lozano, A. C., Luss, R.: Sparse Quantile Huber Regression for Efficient and Robust Estimation (2014) ArXiv:1402.4624v1

  2. Belloni, A., Chernozhukov, V.: L\(_1\)-Penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39, 82–130 (2011)

  3. Breheny, P., Huang, J.: Coordinate descent algorithms for non nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232 (2011)

  4. Briollais, L., Durrieu, G.: Application of quantile regression to recent genetic and -omic studies. Hum. Genet. 133, 951–966 (2014)

  5. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)

  6. El Anbari, M., Mkhadri, A.: Penalized regression combing the L1 norm and a correlation based penalty. Sankhya B 76(1), 82–102 (2014)

  7. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

  8. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

  9. Hebiri, M., van de Geer, S.: The Smooth-Lasso and other \(l_1 + l_2\)-penalized methods. Electron. J. Stat. 5, 1184–1226 (2011)

  10. Huang, P., Breheny, S., Zhang, C.H.: The Mnet method for variable selection. Technical Report N 402, Department of Statistics, Iowa University (2010)

  11. Hunter, D.R., Lange, K.: Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 19, 60–77 (2000)

  12. Jennings, L.S., Wong, K.H., Teo, K.L.: Optimal control computation to account for electric movement. J. Aust. Math. Soc. B 38, 182–193 (1996)

  13. Jiang, D., Huang, J.: Majorization minimization by coordinate descent for concave penalized generalized linear models. Department of Biostatistics, University of Iowa, Report No 412 (2012)

  14. Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)

  15. Koenker, R., Basset, G.: Regression quantiles. Econometrica 46, 33–50 (1970)

  16. Li, C., Wei, Y., Chappell, R., He, X.: Best line quantile regression with application to an allometric study of land mam-mals? speed and mass. Biometrics 67(1), 242–249 (2011)

  17. Li, Y., Zhu, J.: \(L_{1}\)-norm quantile regression. J. Compu. Graph. Stat. 17, 163–185 (2008)

  18. Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penal- ties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)

  19. Oh, S.-K., Lee, T.C.M., Nychka, D.W.: Fast nonparametric quantile regression with arbitrary smoothing methods. J. Compu. Graph. Stat. 20, 510–526 (2011)

  20. Peng, B., Wang, L.: An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J. Compu. Graph. Stat. 24, 676–694 (2015)

  21. Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., Sheffield, V., Stone, E.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103(39), 14429–14434 (2006)

  22. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013)

  23. Slawski, M.: The structured elastic net for quantile regression and support vector classification. Stat. Comput. 22(1), 153–168 (2012)

  24. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

  25. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Assoc B 67, 91–108 (2005)

  26. Wu, Y., Liu, Y.: Variable selection in quantile regression. Stati. Sin. 19, 801–817 (2009)

  27. Yang, Y., Zou, H.: An efficient algorithm for computing The HHSVM and its generalizations. J. Comput. Graph. Stat. 22(2), 396–415 (2013)

  28. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

  29. Zhao, G.H., Teo, K.L., Chan, K.S.: Estimation of conditional quantiles by a new smoothing approximation of asymmetric loss function. Stat. Comput. 15, 5–11 (2005)

  30. Zheng, S.: Gradient descent algorithms for quantile regression with smooth approximation. Int. J. Mach. Learn. Cybern. 2, 191–207 (2011)

  31. Zou, H., Hastie, T.: Regularization and variable selection via the elastic-net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

Download references

Acknowledgments

We warmly thank the reviewers for their careful reading of the previous version of our paper and their helpful comments. This work is partially supported by Centre National pour la Recherche Scientifique et Technique (Morocco) project URAC01 to Abdallah Mkhadri and by the Natural Sciences and Engineering Research Council of Canada and Fonds de recherche du Québec\(-\)Santé grant FRQS\(-\)31110 to Karim Oualkacha.

Author information

Correspondence to Abdallah Mkhadri.

Appendices

Appendix 1

Proof of Proposition 1

The proof of the point (a) is omitted since it is based on simple algebra. We will detail the proof of the point (b) on the convexity of the only function \(\rho \) defined in equation (5) (the proof for the function (4) is similar and is omitted). For that, it suffices to show that for any u and \(v\in ]-\infty ,-(1-\tau )k[\cup [-(1-\tau )k, \tau k[\cup ]\tau k, +\infty [=I_1\cup I_2\cup I_3 \), we have

$$\begin{aligned} \rho (v)-\rho (u)-\rho '(u)(v-u)\ge 0. \end{aligned}$$

It is clear that this inequality is satisfied if u and v are both taken in the same interval \(I_j\) (\(j=1, 2, 3\)), since \(\rho (u)\) is a quadratic function of u or a function of the absolute value of u. So, we establish the proposition for any \((u,v)\in I_j\times I_{j'}\) for \(j\ne j'\). Thus, we have six cases to distinguish.

  1. 1.

    If \(v\in I_1\) and \(u\in I_2\), then

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = (1-\tau )[-v-\frac{k(1-\tau )}{2}]- \frac{v^2}{2k}+\left( \frac{u}{\sqrt{2k}}-\frac{v}{\sqrt{2k}}\right) ^2\\&\quad = \frac{1}{2k}[(u-v)^2-(v+k(1-\tau ))^2]\\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u-v+v+k(1-\tau ))] \\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u+k(1-\tau ))]. \\&\quad \ge 0 \end{aligned}$$

    The last inequality is obtained from the fact that \(u\in I_2=[-(1-\tau )k,\tau k]\). So \((u+k(1-\tau ))\ge 0\), which implies that \(u-2v-k(1-\tau ) \ge -k(1-\tau ) -2v-k(1-\tau )=-2[v+k(1-\tau )] > 0,\) since \(v<-(1-\tau )k\).

  2. 2.

    If \(v\in I_3\) and \(u\in I_2\), then

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{uv}{k}+\frac{u^2}{k} + \frac{v^2}{2k}-\frac{v^2}{2k}\\&\quad = \frac{1}{2k}[(u-v)^2-(v-k\tau )^2]\\&\quad = \frac{1}{2k}[(u-2v+k\tau )(u-k\tau )] \\&\quad \ge 0. \end{aligned}$$

    The last inequality comes from the fact that \(u\in I_2\), so \(u-k\tau \le 0\). It implies that \(u-2v+k\tau \le k\tau -2v+k\tau =2(k\tau -v) \le 0 \), since \(v\in I_3\).

  3. 3.

    If \(u\in I_3\) and \(v\in I_1\), then

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2} -\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}+\frac{k\tau ^2}{2}-\tau v\\&\quad = -v -\frac{k(1-\tau )-k\tau }{2}\\&\quad \ge \frac{2k(1-\tau )-k(1-2\tau )}{2} \quad \text{ since } \quad -v\ge (1-\tau )k\\&\quad = \frac{k}{2} \ge 0 . \end{aligned}$$
  4. 4.

    If \(v\in I_3\) and \(u\in I_1\) , as in case 3.), we have

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{k\tau ^2}{2}+(1-\tau )u \\&\qquad + \frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad = \frac{k(1-2\tau )}{2}+v \\&\quad \ge \frac{k(1-2\tau )+2\tau k}{2}\\&\quad =\frac{k}{2} \quad \text{ since } \quad v>\tau k\\&\quad \ge 0 . \end{aligned}$$
  5. 5.

    If \(v\in I_2\) and \(u\in I_1\), then we have

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2+(1-\tau )u\\&\qquad +\frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad =\frac{1}{2k}v^2+\frac{k(1-\tau )^2}{2}+(1-\tau )v\\&\quad =\left[ \frac{v}{\sqrt{2k}}+\sqrt{k/2}(1-\tau )\right] ^2 \ge 0. \end{aligned}$$
  6. 6.

    If \(v\in I_2\) and \(u\in I_3\) , then we have as in case 5.)

    $$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2-\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = \left[ \frac{v}{\sqrt{2k}}-\sqrt{k/2}\tau \right] ^2 \ge 0. \end{aligned}$$

This completes the proof of Proposition 1. \(\square \)

Appendix 2

Proof of Proposition 2

We will show that the two functions \( \rho _{\tau ,c}(u)\) and \( \rho _{\tau ,k}(u)\) are differentiable and have a Lipschitz continuous first derivative. To do so, one can calculate the derivatives of the functions \( \rho _{\tau ,c}(u)\) and \( \rho _{\tau ,k}(u)\) as

$$\begin{aligned} {\rho ^{\prime }_{\tau ,c}(u)}=\left\{ \begin{array}{ll} \tau -1&{}\quad for\,\, u<-c\\ (1-\tau )u/c&{}\quad for\,\, -c \le u < 0\\ \tau u /c&{}\quad for\,\, 0 \le u < c \\ \tau &{}\quad for\,\, c \le u \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {\rho _{\tau ,k}^{\prime }(u)}=\left\{ \begin{array}{ll} \tau - 1 &{}\quad for\,\, u<-(1-\tau ) k\\ \frac{1}{k}u&{}\quad for\,\, -(1-\tau ) k \le u \le \tau k\\ \tau &{}\quad for\,\, u > \tau k.\\ \end{array} \right. \end{aligned}$$

After some algebra, these derivatives satisfy

$$\begin{aligned}&{\mid \rho ^{\prime }_{\tau ,c}(u)- \rho ^{\prime }_{\tau ,c}(v)\mid }\\&\quad =\left\{ \begin{array}{ll} 0 &{}\quad {if}\quad (u<-c,v<-c)~or~(u>c,v>c)\\ \frac{\tau }{c}\mid u - c \mid &{}\quad {if}\quad (0<u<c,v>c)\\ \frac{\tau }{c}\mid c - v \mid &{}\quad {if}\quad (0<v<c,u>c)\\ \frac{1}{c}\mid c\tau - (1-\tau )u \mid &{}\quad {if}\quad (-c<u<0,v>c)\\ \frac{1}{c}\mid c\tau - (1-\tau )v \mid &{}\quad {if}\quad (-c<v<0,u>c)\\ 1 &{}\quad {if}\quad (-c>v,u>c)~or~(-c>u,v>c)\\ \frac{\tau }{c}\mid u - v\mid &{}\quad {if}\quad (0<u<c,0<v<c)\\ \frac{1}{c}\mid \tau u - (1-\tau )v\mid &{}\quad {if}\quad (0<u<c,-c<v<0)\\ \frac{1}{c}\mid (1-\tau ) u - \tau v\mid &{}\quad {if}\quad (0<v<c,-c<u<0)\\ \frac{1}{c}\mid \tau u - (\tau -1) c\mid &{}\quad {if}\quad (0<u<c,-c>v)\\ \frac{1}{c}\mid \tau v - (\tau -1) c\mid &{}\quad {if}\quad (0<v<c,-c>u)\\ \frac{1-\tau }{c}\mid u - v\mid &{}\quad {if}\quad (-c<u<0,-c<v<0)\\ \frac{1-\tau }{c}\mid u + c\mid &{}\quad {if}\quad (-c<u<0,-c<v<0)\\ \frac{1-\tau }{c}\mid c + v\mid &{}\quad {if}\quad (-c<v<0,-c<u<0)\end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {\mid \rho ^{\prime }_{\tau ,k}(u)- \rho ^{\prime }_{\tau ,k}(v)\mid }=\left\{ \begin{array}{ll} 0 &{}\quad {if}~~~ (u<-(1-\tau ) k,~~~v<-(1-\tau ) k)~~~{\text{ o }r}~~~(u>\tau k,~~~ v> \tau k)\\ \frac{1}{k}|v+(1-\tau ) k| &{}\quad {if}~~~ (u<-(1-\tau ) k,~~~\tau k > v >-(1-\tau ) k)\\ \frac{1}{k}|u+(1-\tau ) k| &{}\quad {if}~~~ (v<-(1-\tau ) k,~~~\tau k > u >-(1-\tau ) k)\\ \frac{1}{k}|u-\tau k| &{}\quad {if}~~~ (v > \tau k,~~~\tau k > u >-(1-\tau ) k)\\ \frac{1}{k}|v-\tau k| &{}\quad {if}~~~ (u > \tau k,~~~\tau k > v >-(1-\tau ) k)\\ \frac{1}{k}|u-v| &{}\quad {if}~~~ (\tau k > u >-(1-\tau ) k,~~~\tau k > v >-(1-\tau ) k)\\ 1 &{}\quad {if}~~~ (u < -(1-\tau ) k,~~~> \tau k )~~~{\text{ o }r}~~~(v < -(1-\tau ) k,~~~u > \tau k )\\ 1 &{}\quad {if}~~~ (u<-\tau k,v>(1-\tau ) k)~{\text{ o }r}~(u>(1-\tau )k,v<\tau k).\end{array} \right. \end{aligned}$$

Therefore, one can verify that

$$\begin{aligned} \mid \rho ^{\prime }_{\tau ,c}(u)- \rho ^{\prime }_{\tau ,c}(u)\mid \le \frac{sup(\tau ,1-\tau )}{c}\mid u - v\mid , \end{aligned}$$

and

$$\begin{aligned} \mid \rho ^{\prime }_{\tau ,k}(u)- \rho ^{\prime }_{\tau ,k}(u)\mid \le \frac{1}{k}\mid u - v\mid , \end{aligned}$$

which ends the proof. \(\square \)

Appendix 3

Proof of Theoreme 3.1

The function \(\varvec{G}\) is a convex function since it the sum of convex functions \(\rho _{\tau ,*}\) (see Proposition 1) and the enet penalty \(P_{\lambda _1,\lambda _2}\). This completes the proof of point 1.).

The proof of the point 2.) of Theorem 3.1 relies on the following result which will be proved in Lemma 1: for all m we have

$$\begin{aligned} \varvec{G}(\bar{{\varvec{\beta }}}^{m}) - \varvec{G}(\bar{{\varvec{\beta }}}^{m+1}) \ge \theta \Vert \bar{{\varvec{\beta }}}^{m}-\bar{{\varvec{\beta }}}^{m+ 1}\Vert _2^2. \end{aligned}$$
(19)

Since the sequence \(\varvec{G}(\bar{{\varvec{\beta }}}^{m})\) is decreasing and is bounded below [e.g. \(0 \le \varvec{G}(\bar{{\varvec{\beta }}}^{m})\), for all m], then it converges. Thus, from (19) one can verify that the sequence generated by Algorithm 1 cannot cycle without convergence; i.e. it must have a unique limit point. This completes the proof of the point 2.). \(\square \)

Now we will prove the point 3.). In the sequel, the parameter \(\tau \) is omitted in the parameter vector \({\varvec{\beta }}\) and in \(\beta _0\) to simplify the presentation of the proof. For any \(\bar{{\varvec{\beta }}} = (\beta _0, \beta _1,\ldots ,\beta _p)^\top \) and \(\varvec{\gamma }_j = (0, \ldots , 0,\gamma _j, 0, \ldots , 0)^\top \in I\!\!R^{p+1}\), we have

$$\begin{aligned}&\text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}} + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}})}{\alpha }\nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0-\mathbf {x}_i^{\top }{\varvec{\beta }} \} x_{ij} \gamma _j + \lambda _2 \beta _j \gamma _j \nonumber \\&\qquad +\, \text{ lim }\inf _{\alpha \downarrow 0+} \frac{|\beta _j + \alpha \gamma _j| - |\beta _j|}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0-\mathbf {x}_i^{\top }{\varvec{\beta }} \} x_{ij} \gamma _j + \lambda _2 \beta _j \gamma _j + \partial {|\beta _j|_{\gamma _j}}\nonumber \\ \end{aligned}$$
(20)

for \(j\in {1,\ldots ,p}\), with

$$\begin{aligned} \partial {|\beta _j|_{\gamma _j}} = \left\{ \begin{array}{ll} sgn(\beta _j) \lambda _1 \gamma _j &{}~~~\text{ for }~~~ |\beta _j| > 0; \\ \lambda _1 |\gamma _j| &{}~~~\text{ for } ~~~ \beta _j = 0. \end{array} \right. \end{aligned}$$
(21)

Assume a subsequence \(\bar{{\varvec{\beta }}}^{n_k} \rightarrow \bar{{\varvec{\beta }}}^\infty = (\beta _0^{\infty },\ldots ,\beta _p^{\infty })^\top \), by equation (19), the successive differences converge to zero (i.e. \(\bar{{\varvec{\beta }}}^{n_k} - \bar{{\varvec{\beta }}}^{n_k-1} \rightarrow 0\)). Thus, as \(k \rightarrow \infty \), we have

$$\begin{aligned}&\bar{{\varvec{\beta }}}^{n_k-1}_j = (\beta _0^{n_k}, \ldots , \beta _{j-1}^{n_k},\beta _j^{n_k},\beta _{j+1}^{n_k-1}, \ldots ,\beta _p^{n_k-1}) \nonumber \\&\quad \rightarrow (\beta _0^{\infty }, \ldots , \beta _{j-1}^{\infty },\beta _j^{\infty },\beta _{j+1}^{\infty }, \ldots ,\beta _p^{\infty }). \end{aligned}$$
(22)

By (21) and (22), we have for \(j\in \{1,\ldots ,p\}\)

$$\begin{aligned}&\partial {|\beta _j^{n_k}|_{\gamma _j}} \rightarrow \partial {|\beta _j^{\infty }|_{\gamma _j}}, \; \; \text {if} \; \; \beta _j^{\infty } \ne 0; \quad \partial {|\beta _j^{\infty }|_{\gamma _j}}\nonumber \\&\quad \ge \text{ lim }\inf _{k} \partial {|\beta _j^{n_k}|_{\gamma _j}}, \; \; \text {if} \; \; \beta _j^{\infty } = 0. \end{aligned}$$
(23)

By the coordinate-wise minimum of j-th coordinate \(j\in \{1,\ldots ,p\}\), one has

$$\begin{aligned}&-\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{n_k}-\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{n_k-1} \} x_{ij} \gamma _j + \lambda _2 \beta _j^{n_k} \gamma _j \nonumber \\&\quad +\, \partial {|\beta _j^{n_k}|_{\gamma _j}} \ge 0 , \quad \text {for all} \;\; k, \end{aligned}$$
(24)

with \({\varvec{\beta }}_j^{n_k-1} = (\beta _1^{n_k}, \ldots , \beta _{j-1}^{n_k},\beta _j^{n_k},\beta _{j+1}^{n_k-1},\ldots , \beta _p^{n_k-1})^\top \). From (23, 24) one can write, for all \(j\in \{1,\ldots ,p\}\)

$$\begin{aligned}&-\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _j + \lambda _2 \beta _j^{\infty } \gamma _j + \partial {|\beta _j^{\infty } |_{\gamma _j}}\nonumber \\&\quad \ge \text{ lim }\inf _{k} \left[ -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{n_k}-\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{n_k-1} \} x_{ij} \gamma _j \right. \nonumber \\&\qquad +\,\left. \lambda _2 \beta _j^{n_k} \gamma _j + \partial {|\beta _j^{n_k}|_{\gamma _j}}\right] \ge 0 \end{aligned}$$
(25)

By (20, 25), for \(j\in \{1,\ldots ,p\}\), we have

$$\begin{aligned} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \ge 0. \end{aligned}$$
(26)

For \(j=0\), following the same above arguments, one can easily verify that

$$\begin{aligned}&-\frac{1}{n}\sum \limits ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \ge 0. \end{aligned}$$
(27)

Thus for \({\varvec{\gamma }} = (\gamma _0, \ldots ,\gamma _p)^\top \in I\!\!R^{p+1}\), we have

$$\begin{aligned}&\text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i} -\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \nonumber \\&\qquad +\, \sum ^{p}_{j=1} \left[ -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty }-\mathbf {x}_i^{\top }{\varvec{\beta }}^{\infty } \} x_{ij} \gamma _j + \lambda _2 \beta _j^{\infty } \gamma _j \right] \nonumber \\&\qquad +\, \lambda _1\sum ^{p}_{j=1} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{|\beta _j^{\infty } + \alpha \gamma _j| - |\beta _j^{\infty }|}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \nonumber \\&\qquad +\, \sum ^{p}_{j=1} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \nonumber \\&\quad \ge 0, \end{aligned}$$
(28)

where the last inequality (\(\ge \)0) is obtained using equations (26) and (27). This limit point is a global minimum since the function G is convex, which completes the proof of (3).\(\square \)

Lemma 1

Let \(G(.|\beta _0, {\varvec{\beta }})\) and \(F(.|\beta _0, {\varvec{\beta }})\) be defined as in Theorem 1 and let \(\varvec{G}(\bar{{\varvec{\beta }}})\) the global objective function defined by (6). Let \(\bar{{\varvec{\beta }}}^m_\tau = \{ \beta _0^m(\tau ), {\varvec{\beta }}^m_\tau \}^\top \) be a sequence of iterates generated by the iteration map of Algorithm 1 of our MM coordinate descente algorithm.

Then the sequence \(\bar{{\varvec{\beta }}}^m\) satisfies equation (19) for all m, i.e.

$$\begin{aligned} \varvec{G}(\bar{{\varvec{\beta }}}^{m}) - \varvec{G}(\bar{{\varvec{\beta }}}^{m+1}) \ge \theta \Vert \bar{{\varvec{\beta }}}^{m}-\bar{{\varvec{\beta }}}^{m+1} \Vert _2^2. \end{aligned}$$

Proof of Lemma

The objective function to minimize for each coordinate \(j = 1,\ldots ,p\) can be written as

$$\begin{aligned} \varvec{G}(u|{\varvec{\beta }})=\frac{1}{n}\sum _{i=1}^n \rho _{\tau ,*}(r_i-x_{ij}(\beta _j-u))+ \lambda _1|u|+\frac{\lambda _2}{2}u^2,\nonumber \\ \end{aligned}$$
(29)

where \(r_i=y_i-\beta _0-\mathbf {x}_i^T{\varvec{\beta }}\) and the parameter \(\tau \) is omitted in the parameter vector \({\varvec{\beta }}\) to simplify the presentation of the proof. Let \(u_0\) be the minimizer of \(\varvec{G}(.|{\varvec{\beta }})\) with respect the specified coordinate \(j=1, \ldots , p\). Following the same argument as in Mazumber et al. 2011 and Jiang and Huang (2012), we need to show that

$$\begin{aligned} \varvec{G}(u_0+\nu |{\varvec{\beta }}) -\varvec{G}(u_0|{\varvec{\beta }})\ge \theta \nu ^2, \end{aligned}$$
(30)

where \(\theta >0\) and \(\nu \) a small real value. In fact, we notice that \(\varvec{G}(.|{\varvec{\beta }})\) is a convex function, since it is a sum of convex functions. Notice that for \(u\ne 0\), the gradient of \(\varvec{G}(.|{\varvec{\beta }})\) exists and is equal to

$$\begin{aligned} d=\varvec{G}^{\prime }(u|{\varvec{\beta }})= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i -x_{ij}(\beta _j-u))]x_{ij}\nonumber \\&\quad + \lambda _1\text{ sgn }(u)+\lambda _2u. \end{aligned}$$
(31)

If \(u=0\), then d can be written as

$$\begin{aligned} d=-\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i-x_{ij}(\beta _j-u))] x_{ij}+ \lambda _1z+\lambda _2u, \end{aligned}$$
(32)

with \(z\in (-1,1)\).

In our coordinate MM descent algorithm, the function to be minimized is

$$\begin{aligned} F(u|{\varvec{\beta }})= & {} cte+\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime } (r_i)x_{ij}(\beta _j-u) \\&+ \,\frac{1}{\delta }(\beta _j-u)^2+\lambda _1|u|+\frac{\lambda _2}{2}u^2. \end{aligned}$$

Then any minimizer u of \(F(.|{\varvec{\beta }})\) will satisfy the equation

$$\begin{aligned} 0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i)x_{ij} -\frac{2}{\delta }(\beta _j-u) + \lambda _1\text{ sgn }(u)+\lambda _2u \nonumber \\= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0 -{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}\beta _j)x_{ij} \nonumber \\&\quad -\frac{2}{\delta }(\beta _j-u) + \lambda _1\text{ sgn }(u)+\lambda _2u, \end{aligned}$$
(33)

where \({\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}=\sum _{\ell \ne j}\beta _{\ell }x_{i\ell }\). Since \(\varvec{G}(.|{\varvec{\beta }})\) is minimized at \(u_0\), by (32), we have

$$\begin{aligned} 0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0-{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}(u_0+\nu ))x_{ij} \nonumber \\&\quad -\frac{2}{\delta }(u_0+\nu -u_0) + \lambda _1\text{ sgn }(u_0)+\lambda _2u_0. \end{aligned}$$
(34)

If \(u_0=0\), the latter equation is true for some value of \(\text{ sgn }(u_0)\in (-1,1)\). Now, let \(d_0\) be the sub-gradient of \(\varvec{G}(.|{\varvec{\beta }})\) at \(u_0\) and defined by

$$\begin{aligned} d_0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0 -{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}u_0)x_{ij}\nonumber \\&\quad + \lambda _1\text{ sgn }(u_0)+\lambda _2u_0, \end{aligned}$$
(35)

where \(\text{ sgn }(u_0)\in (-1,1)\). From (34) and (35), one can write

$$\begin{aligned} d_0=\frac{2}{\delta }\nu + \frac{1}{n}\sum _{i=1}^n[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})]x_{ij}, \end{aligned}$$

where \(r_i^{(0)}=y_i-\beta _0-{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)} -x_{ij}u_0\). From equation (31), we can write

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G} (u_0|{\varvec{\beta }})\ge d_0\nu \nonumber \\&\quad =\left( \frac{2}{\delta }\nu + \frac{1}{n}\sum _{i=1}^n\left[ \rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\right] x_{ij}\right) \nu .\nonumber \\ \end{aligned}$$
(36)

From Proposition 2, we have

$$\begin{aligned} -\frac{|x_{ij}|\nu }{\delta }\le \rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\le \frac{|x_{ij}|\nu }{\delta }. \end{aligned}$$

We assume that for fixed j, we have \(x_{ij}\ge 0\) for \(i=1,\ldots , n_0\) and \(x_{ij}< 0\) for \(i=n_0+1,\ldots , n\). Then, we get

$$\begin{aligned}&[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})] x_{ij}\ge -\frac{|x_{ij}|\nu }{\delta }x_{ij}\\&\quad = -\frac{x_{ij}^2\nu }{\delta } \qquad \text{ for } \quad i=1,\ldots , n_0, \end{aligned}$$

and

$$\begin{aligned}&[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})] x_{ij}\ge \frac{|x_{ij}|\nu }{\delta }x_{ij}\\&\quad = -\frac{x_{ij}^2\nu }{\delta } \qquad \text{ for } \quad i=n_0+1,\ldots , n. \end{aligned}$$

Thus, from (36) we obtain that

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G}(u_0| {\varvec{\beta }})\\&\quad \ge \displaystyle \frac{2\nu ^2}{\delta }+\displaystyle \frac{\nu }{n} \displaystyle \sum _{i=1}^{n_0} \{\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})\} x_{ij}\\&\qquad + \displaystyle \frac{\nu }{n}\displaystyle \sum _{i=n_0+1}^{n} \{\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\}x_{ij}\\&\quad = \displaystyle \frac{2\nu ^2}{\delta } - \displaystyle \frac{\nu ^2}{n\delta }\displaystyle \sum _{i=1}^{n_0}x_{ij}^2- \displaystyle \frac{\nu ^2}{n\delta }\displaystyle \sum _{i=n_0+1}^{n}x_{ij}^2\\&\quad = \displaystyle \frac{2\nu ^2}{\delta } - \displaystyle \frac{\nu ^2}{\delta }\left( \displaystyle \frac{1}{n} \displaystyle \sum _{i=1}^{n}x_{ij}^2\right) \\&\quad = \displaystyle \frac{\nu ^2}{\delta } \qquad \left( \text{ since } \quad \displaystyle \sum _{i=1}^{n}x_{ij}^2=1\right) . \end{aligned}$$

Thus, equation (30) holds for every \(\beta _1,\ldots ,\beta _p\).

Now consider \(\beta _0\), observe that

$$\begin{aligned}&\varvec{G}(u+\nu |{\varvec{\beta }}) -\varvec{G}(u|{\varvec{\beta }})\nonumber \\&\quad =\frac{1}{n}\sum _{i=1}^n \rho _{\tau ,*}(r_i^*-u-\nu )-\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*} (r_i^*-u)), \end{aligned}$$

where \(r_i^*=y_i-\mathbf {x}_i^T{\varvec{\beta }}\). Then, using similar arguments as in (34) and (35), we have

$$\begin{aligned} 0=-\displaystyle \frac{1}{n}\displaystyle \sum _{i=1}^{n}\rho _{\tau ,*}^{\prime } (r_i^*-u_0-\nu )-\displaystyle \frac{2}{\delta }\nu , \end{aligned}$$
(37)

and the gradient of \(\varvec{G}(.|{\varvec{\beta }})\) at \(u_0\) is given by

$$\begin{aligned} d_0=-\displaystyle \frac{1}{n}\displaystyle \sum _{i=1}^{n} \rho _{\tau ,*}^{\prime }(r_i^*-u_0). \end{aligned}$$
(38)

As in equation (36), we can write

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G} (u_0|{\varvec{\beta }})\\&\quad \ge -\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} \rho _{\tau ,*}^{\prime }(r_i^*-u_0)\\&\quad = -\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} [\rho _{\tau ,*}^{\prime }(r_i^*-u_0)\\&\qquad +\,\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu ) -\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu )]\\&\quad = -\displaystyle \frac{\nu }{n} \displaystyle \sum _{i=1}^{n}\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu ) \\&\qquad +\,\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} [\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu )-\rho _{\tau ,*}^{\prime }(r_i^*-u_0)]. \end{aligned}$$

Now, using equation (37) and Proposition 2 for the first and second terms of the latter inequality, respectively, we obtain

$$\begin{aligned} \varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G}(u_0 |{\varvec{\beta }})\ge \displaystyle \frac{2\nu ^2}{\delta }-\displaystyle \frac{\nu ^2}{\delta }= \displaystyle \frac{\nu ^2}{\delta }. \end{aligned}$$
(39)

Finally, equation (30) holds with \(\theta =1/\delta \) for \(\beta _0,\beta _1,\ldots ,\beta _p\). Then, we have

$$\begin{aligned} \begin{array}{lll} \varvec{G}({\varvec{\beta }}_j^{m-1}|{\varvec{\beta }}) -\varvec{G}({\varvec{\beta }}_{j+1}^{m-1}|{\varvec{\beta }}) &{}\ge &{} \theta (\beta _{j+1}^m-\beta _{j+1}^{m+1})^2\\ &{}=&{}\theta \Vert {\varvec{\beta }}_j^{m}-{\varvec{\beta }}_j^{m-1} \Vert _2^2. \end{array} \end{aligned}$$
(40)

Consequently, apply (40) over all coordinates, we have for all m

$$\begin{aligned} \varvec{G}({\varvec{\beta }}^{m}|{\varvec{\beta }}) -\varvec{G}({\varvec{\beta }}^{m+1}|{\varvec{\beta }}) \ge \theta \Vert {\varvec{\beta }}^{m}-{\varvec{\beta }}^{m+1}\Vert _2^2. \end{aligned}$$
(41)

This completes the proof of the Lemma 6.1. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mkhadri, A., Ouhourane, M. & Oualkacha, K. A coordinate descent algorithm for computing penalized smooth quantile regression. Stat Comput 27, 865–883 (2017). https://doi.org/10.1007/s11222-016-9659-9

Download citation

Keywords

  • Variable selection
  • Quantile regression
  • Smooth check function
  • Coordinate descent algorithm