Skip to main content

Communication-efficient sparse composite quantile regression for distributed data

Abstract

Composite quantile regression (CQR) estimator is a robust and efficient alternative to the M-estimator and ordinary quantile regression estimator in linear models. In order to construct sparse CQR estimation in the presence of distributed data, we propose a penalized communication-efficient surrogate loss function that is computationally superior to the original global loss function. The proposed method only needs the worker machines to compute the gradient based on local data without a penalty and the central machine to solve a regular estimation problem. We prove that the estimation errors based on the proposed method match the estimation error bound of the centralized method by analyzing the entire data set simultaneously. A modified alternating direction method of multipliers algorithm is developed to efficiently obtain the sparse CQR estimator. The performance of the proposed estimator is studied through simulation, and an application to a real data set is also presented.

This is a preview of subscription content, access via your institution.

References

  • Belloni A, Chernozhukov V (2011) L1-penalized quantile regression in highdimensional sparse models. Ann Stat 39:82–130

    MATH  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122

    MATH  Article  Google Scholar 

  • Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica 24:1655–1684

    MathSciNet  MATH  Google Scholar 

  • Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202

    MathSciNet  MATH  Google Scholar 

  • Fan J, Guo Y, Wang K (2021) Communication-efficient accurate statistical estimation. J Am Stat Assoc, to appear

  • Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66:7132–7154

    MathSciNet  MATH  Article  Google Scholar 

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909

    MathSciNet  MATH  Google Scholar 

  • Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004

    MathSciNet  MATH  Article  Google Scholar 

  • Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681

    MathSciNet  MATH  Article  Google Scholar 

  • Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B Stat Methodol 72:49–69

    MathSciNet  MATH  Article  Google Scholar 

  • Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332

    MathSciNet  MATH  Article  Google Scholar 

  • Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46:33–50

    MathSciNet  MATH  Article  Google Scholar 

  • Koenker R, Ng P (2005) Inequality constrained quantile regression. Sankhya Indian J Stat 67:418–440

    MathSciNet  MATH  Google Scholar 

  • Koltchinskii V (2011) Oracle inequalities in empirical risk minimization and sparse recovery problems. Springer, New York

    MATH  Book  Google Scholar 

  • Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18:115–144

    MathSciNet  MATH  Google Scholar 

  • van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202

    MathSciNet  MATH  Google Scholar 

  • van Der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    MATH  Book  Google Scholar 

  • Volgushev S, Chao SK, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47:1634–1662

    MathSciNet  MATH  Article  Google Scholar 

  • Wang L, Lian H (2020) Communication-efficient estimation of high-dimensional quantile regression. Anal Appl 18:1057–1075

    MathSciNet  MATH  Article  Google Scholar 

  • Zhang C, Zhang S (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol 76:217–242

    MathSciNet  MATH  Article  Google Scholar 

  • Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14:3321–3363

    MathSciNet  MATH  Google Scholar 

  • Zhang Y, Duchi JC, Wainwright MJ (2015) Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J Mach Learn Res 16:3299–3340

    MathSciNet  MATH  Google Scholar 

  • Zhao K, Lian H (2016) A note on the efficiency of composite quantile regression. J Stat Comput Simul 86:1334–1341

    MathSciNet  MATH  Article  Google Scholar 

  • Zhao W, Zhang F, Lian H (2020) Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Trans Neural Netw Learn Syst 31:2569–2577

    MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to the Editor, an associate editor and one anonymous referee for their insightful comments and suggestions, which have led to significant improvements. This article was supported by the National Natural Science Foundation of China [Grant Nos. 11871287, 11771144, 11801359], the Natural Science Foundation of Tianjin [Grant No. 18JCYBJC41100], Fundamental Research Funds for the Central Universities [ZB22000102] and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

Under conditions (C1)–(C5), with probability at least \(1-(nK)^{-C}\),

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0))\right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$

Proof of Lemma 1

Firstly, we write

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)) \right| \\&\quad \le \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (I\{ \epsilon _{ik} \le u\} - I \{ \epsilon _{ik} \le 0 \})\right. \\&\qquad \left. -\, E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\qquad + \, \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) \right. \\&\left. \qquad + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| . \end{aligned}$$

Define the class of functions

$$\begin{aligned} \mathcal {F}_{1} = \left\{ \frac{1}{K}\sum _{k=1}^{K}( I\{\epsilon _{ik} \le u \} - I\{\epsilon _{ik} \le 0 \}):|u| \le r \right\} , \end{aligned}$$

with envelope function \(\mathcal {F}({\varvec{{x}}},y) = 1\). By Lemma 2.6.15 and Lemma 2.6.18 in van Der Vaart and Wellner (1996), \(\mathcal {F}_{1}\) is a \(Vapnik-\breve{C}ervonenkis\) (or simply VC)-subgraph. By Theorem 2.6.7 of van Der Vaart and Wellner (1996), we have

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1}(u),L_{2}(P_{n})) \le \frac{C\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$

Since u can take at most nK different values,

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$

Let \(\sigma _{1}^2 = \sup _{f\in \mathcal {F}} Pf^2\). Then by Theorem 3.12 of Koltchinskii (2011), with \(\Vert F \Vert _{L_{2}(P)}\) obviously bounded by a constant, we have

$$\begin{aligned} E\Vert R_{n} \Vert _{\mathcal {F}_{1}} \le C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$

where \(\Vert R_{n}\Vert _{\mathcal {F}_{1}} = \sup _{f\in \mathcal {F}_{1}} n^{-1}\sum _{i=1}^{n}\epsilon _{i}f({\varvec{{x}}}_{i},y_{i})\) with \(\epsilon _{i}\) being i.i.d Rademacher random variables. Using the symmetrization inequality, it can be shown that

$$\begin{aligned} E\Vert P_{n}-P\Vert _{\mathcal {F}_{1}} \le 2E\Vert R_{n} \Vert _{\mathcal {F}_{1}}, \end{aligned}$$

and Talagrand’s inequality in Koltchinskii (2011) gives

$$\begin{aligned} P\left( \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \ge C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}+\sqrt{\frac{\sigma _{1}^2t}{n}}+\frac{t}{n}\right) \right) \le e^{-t}. \end{aligned}$$

That is, with probability \(1-(nK)^{-C}\),

$$\begin{aligned} \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \le C \left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) . \end{aligned}$$

It is easy to prove that \(\sigma _{1}^{2} \le Cr\). Similarly, define the class of functions

$$\begin{aligned} \mathcal {F}_{2} = \{F(u)-F(0) :|u| \le r \}. \end{aligned}$$

Using the similar arguments, it can be shown that

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{2},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }, \end{aligned}$$

and then with probability \(1-(nK)^{-C}\), we have

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\quad \le C\left( \sigma _{2}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$

where \(\sigma _{2}^2 \le Cr^2\). Thus, with probability at least \(1-(nK)^{-C}\),

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0)) \right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$

\(\square \)

Proof of Theorem 1

Step 1 Let \({\varvec{{\delta }}}= \check{{\varvec{{b}}}}-{\varvec{{b}}}_{0}\) and \({\varvec{{\Delta }}}= \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0}\). Since \(\tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})\) is convex, we have

$$\begin{aligned} \tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \ge \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{\beta }}}-{\varvec{{\beta }}}_{0})+\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{b}}}-{\varvec{{b}}}_{0}), \end{aligned}$$

for all \({\varvec{{b}}}\) and \({\varvec{{\beta }}}\). Using

$$\tilde{L}( \check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}}) + \lambda \Vert \check{{\varvec{{\beta }}}} \Vert _{1} \le \tilde{L}( {\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) + \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1},$$

we get

$$\begin{aligned}&-\Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\delta }}}\Vert _{1} -\Vert \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\Delta }}}\Vert _{1} \\&\quad \le \tilde{L}(\check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1} - \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$

Under event

$$\begin{aligned} \mathcal {A}_{1} = \left\{ \Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le 3\lambda /(2K), \Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le \lambda /2 \right\} , \end{aligned}$$

it leads to

$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}\Vert _{1} \le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1}- \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$

Writing \(\Vert {\varvec{{\Delta }}}\Vert _{1} = \Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}\), \(\Vert {\varvec{{\beta }}}_{0} \Vert _{1} =\Vert {\varvec{{\beta }}}_{0S} \Vert _{1}\) and \(\Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}=\Vert {\varvec{{\beta }}}_{0S} +{\varvec{{\Delta }}}_{S} \Vert _{1} + \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \), we get

$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1}-\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} - \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}. \end{aligned}$$

After rearranging,

$$\begin{aligned} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+ \frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}. \end{aligned}$$

Similar to Lemma 3 of Gu and Zou (2020), it leads to

$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{1}) \ge 1-2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }. \end{aligned}$$

Step 2 It can be easily verified that

$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}). \end{aligned}$$

Define \(\epsilon _{ik} = y_{i}-{\varvec{{x}}}_{i}^T{\varvec{{\beta }}}_{0}-b_{0k}\). Using Knight’s identity, we have

$$\begin{aligned} | x-y|- |x |=-y(I(x>0)-I(x<0))+2\int _{0}^{y}[I(x\le t)-I(x\le 0)]dt, \end{aligned}$$

which yields

$$\begin{aligned} \rho _{\tau }(x-y) - \rho _{\tau }(x)= -y(\tau -I \{ x \le 0 \}) + \int _{0}^{y} I (\{ x \le u\} - I\{x \le 0 \}) du. \end{aligned}$$

Then, it can be seen that

$$\begin{aligned}&\rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}({\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) -(b_{0k}+\delta _{k}))- \rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}{\varvec{{\beta }}}_{0}-b_{0k})\\&\quad +\, {\varvec{{x}}}_{i}^T{\varvec{{\Delta }}}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})+\delta _{k}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})\\&\quad =\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} du. \end{aligned}$$

Thus, it leads to

$$\begin{aligned}&L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{\beta }}}_{0},{\varvec{{\beta }}}_{0})\\&\quad -\, EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+\, EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =\frac{1}{nK} \sum _{k=1}^{K}\sum _{i=1}^{n}\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)du. \end{aligned}$$

Let

$$\begin{aligned} \mathcal {A}_{2}&= \left\{ \underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0))\right| \right. \\&\left. \le \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n} \right\} . \end{aligned}$$

Based on the proof of Lemma 1, we know that for \(r >0\),

$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{2}) \ge 1-(nK)^{-C}. \end{aligned}$$

Using facts that \(\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t\), \(\max _{i} \Vert {\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k} \Vert _2 \le c_{n}\Vert {\varvec{{\Delta }}}\Vert _{1} +\Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\Vert {\varvec{{\Delta }}}_{S} \Vert _{1} +\left( 1+\frac{3}{K}\right) \Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 +(K+3)\Vert {\varvec{{\delta }}}\Vert _2\le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 + (K+3)(t-\Vert {\varvec{{\Delta }}}\Vert _2) \le 4c_{n}\sqrt{s}t\), we get

$$\begin{aligned}&\underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t} }{\sup }\left| L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\right. \\&\qquad \left. -\, {\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \right| \\&\quad \le \int _{0}^{4c_{n}\sqrt{s}t} \sqrt{\frac{r\log (nK)}{n} }+\frac{\log (nK)}{n}dr\\&\quad = C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$

Step 3 Step 1 implies

$$\begin{aligned} \underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert +\Vert {\varvec{{\delta }}}\Vert \le t} }{\inf }\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})+\lambda \Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _{1}-\lambda \Vert {\varvec{{\beta }}}_{0}\Vert _{1}\le 0. \end{aligned}$$

We have \(\Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _1-\Vert {\varvec{{\beta }}}_{0}\Vert _1\ge -\Vert {\varvec{{\Delta }}}_S\Vert _1\ge -\sqrt{s}\Vert {\varvec{{\Delta }}}_{S}\Vert _2\ge -\sqrt{s}t\). Furthermore, using Eq. (3.7) of Belloni and Chernozhukov (2011) and results from the previous steps to obtain the lower bound for \(E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\) below, we have

$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \\&\quad \ge E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\\&\qquad -\, \Vert {\varvec{{\Delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }-\Vert {\varvec{{\delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }\\&\qquad -\, C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \\&\quad \ge C(t^2\wedge t) -C\lambda \sqrt{s}t- C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$

Thus, we have

$$\begin{aligned} C(t^2\wedge t) -C\lambda \sqrt{s}t - C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \le 0, \end{aligned}$$

and

$$\begin{aligned} t\le C\left( \lambda \sqrt{s}+\frac{c_{n}\sqrt{s}\log {(nK)}}{n}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) \le C\left( \lambda \sqrt{s}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) . \end{aligned}$$

Then, with probability at least

$$\begin{aligned}&\mathrm {Pr}(\mathcal {A}_{1}\bigcap \mathcal {A}_{2}) \ge 1- \mathrm {Pr}(\mathcal {A}_{1}^c)-\mathrm {Pr}(\mathcal {A}_{2}^c)\ge 1\\&\quad -\, 2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }-(nK)^{-C}, \end{aligned}$$

we have

$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert \le t. \end{aligned}$$

The second result is obtained by noting \(\Vert {\varvec{{\Delta }}}\Vert _{1} \le 4\Vert {\varvec{{\Delta }}}_{S}\Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1} \le C \sqrt{s}t\), such that

$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert _{1} \le C\sqrt{s}t\le C \left( \lambda s+\frac{s^{2}c_{n}^2 \log (nK)}{n}\right) . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Wang, L. Communication-efficient sparse composite quantile regression for distributed data. Metrika (2022). https://doi.org/10.1007/s00184-022-00868-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00184-022-00868-z

Keywords

  • ADMM
  • Communication-efficient surrogate loss
  • Composite quantile regression
  • Distributed estimation
  • Lasso penalty