Communication-efficient sparse composite quantile regression for distributed data

Yang, Yaohong; Wang, Lei

doi:10.1007/s00184-022-00868-z

Communication-efficient sparse composite quantile regression for distributed data

Published: 16 June 2022

Volume 86, pages 261–283, (2023)
Cite this article

Metrika Aims and scope Submit manuscript

441 Accesses
3 Citations
Explore all metrics

Abstract

Composite quantile regression (CQR) estimator is a robust and efficient alternative to the M-estimator and ordinary quantile regression estimator in linear models. In order to construct sparse CQR estimation in the presence of distributed data, we propose a penalized communication-efficient surrogate loss function that is computationally superior to the original global loss function. The proposed method only needs the worker machines to compute the gradient based on local data without a penalty and the central machine to solve a regular estimation problem. We prove that the estimation errors based on the proposed method match the estimation error bound of the centralized method by analyzing the entire data set simultaneously. A modified alternating direction method of multipliers algorithm is developed to efficiently obtain the sparse CQR estimator. The performance of the proposed estimator is studied through simulation, and an application to a real data set is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding global minima via kernel approximations

Article 04 April 2024

Alessandro Rudi, Ulysse Marteau-Ferey & Francis Bach

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Sebastian Pokutta

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

Article 08 April 2024

Kejie Tang, Weidong Liu & Xiaojun Mao

References

Belloni A, Chernozhukov V (2011) L1-penalized quantile regression in highdimensional sparse models. Ann Stat 39:82–130
Article MATH Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122
Article MATH Google Scholar
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica 24:1655–1684
MathSciNet MATH Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202
MathSciNet MATH Google Scholar
Fan J, Guo Y, Wang K (2021) Communication-efficient accurate statistical estimation. J Am Stat Assoc, to appear
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66:7132–7154
Article MathSciNet MATH Google Scholar
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909
MathSciNet MATH Google Scholar
Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004
Article MathSciNet MATH Google Scholar
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681
Article MathSciNet MATH Google Scholar
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B Stat Methodol 72:49–69
Article MathSciNet MATH Google Scholar
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332
Article MathSciNet MATH Google Scholar
Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet MATH Google Scholar
Koenker R, Ng P (2005) Inequality constrained quantile regression. Sankhya Indian J Stat 67:418–440
MathSciNet MATH Google Scholar
Koltchinskii V (2011) Oracle inequalities in empirical risk minimization and sparse recovery problems. Springer, New York
Book MATH Google Scholar
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18:115–144
MathSciNet MATH Google Scholar
van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202
MathSciNet MATH Google Scholar
van Der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Book MATH Google Scholar
Volgushev S, Chao SK, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47:1634–1662
Article MathSciNet MATH Google Scholar
Wang L, Lian H (2020) Communication-efficient estimation of high-dimensional quantile regression. Anal Appl 18:1057–1075
Article MathSciNet MATH Google Scholar
Zhang C, Zhang S (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol 76:217–242
Article MathSciNet MATH Google Scholar
Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14:3321–3363
MathSciNet MATH Google Scholar
Zhang Y, Duchi JC, Wainwright MJ (2015) Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J Mach Learn Res 16:3299–3340
MathSciNet MATH Google Scholar
Zhao K, Lian H (2016) A note on the efficiency of composite quantile regression. J Stat Comput Simul 86:1334–1341
Article MathSciNet MATH Google Scholar
Zhao W, Zhang F, Lian H (2020) Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Trans Neural Netw Learn Syst 31:2569–2577
MathSciNet Google Scholar
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to the Editor, an associate editor and one anonymous referee for their insightful comments and suggestions, which have led to significant improvements. This article was supported by the National Natural Science Foundation of China [Grant Nos. 11871287, 11771144, 11801359], the Natural Science Foundation of Tianjin [Grant No. 18JCYBJC41100], Fundamental Research Funds for the Central Universities [ZB22000102] and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.

Author information

Authors and Affiliations

School of Statistics and Data Science and LPMC, Nankai University, Tianjin, 300071, People’s Republic of China
Yaohong Yang & Lei Wang

Authors

Yaohong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 1

Under conditions (C1)–(C5), with probability at least $1-(nK)^{-C}$,

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0))\right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$

Proof of Lemma 1

Firstly, we write

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)) \right| \\&\quad \le \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (I\{ \epsilon _{ik} \le u\} - I \{ \epsilon _{ik} \le 0 \})\right. \\&\qquad \left. -\, E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\qquad + \, \underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) \right. \\&\left. \qquad + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| . \end{aligned}$$

Define the class of functions

$$\begin{aligned} \mathcal {F}_{1} = \left\{ \frac{1}{K}\sum _{k=1}^{K}( I\{\epsilon _{ik} \le u \} - I\{\epsilon _{ik} \le 0 \}):|u| \le r \right\} , \end{aligned}$$

with envelope function $\mathcal {F}({\varvec{{x}}},y) = 1$. By Lemma 2.6.15 and Lemma 2.6.18 in van Der Vaart and Wellner (1996), $\mathcal {F}_{1}$ is a $Vapnik-\breve{C}ervonenkis$ (or simply VC)-subgraph. By Theorem 2.6.7 of van Der Vaart and Wellner (1996), we have

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1}(u),L_{2}(P_{n})) \le \frac{C\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$

Since u can take at most nK different values,

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{1},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }. \end{aligned}$$

Let $\sigma _{1}^2 = \sup _{f\in \mathcal {F}} Pf^2$. Then by Theorem 3.12 of Koltchinskii (2011), with $\Vert F \Vert _{L_{2}(P)}$ obviously bounded by a constant, we have

$$\begin{aligned} E\Vert R_{n} \Vert _{\mathcal {F}_{1}} \le C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$

where $\Vert R_{n}\Vert _{\mathcal {F}_{1}} = \sup _{f\in \mathcal {F}_{1}} n^{-1}\sum _{i=1}^{n}\epsilon _{i}f({\varvec{{x}}}_{i},y_{i})$ with $\epsilon _{i}$ being i.i.d Rademacher random variables. Using the symmetrization inequality, it can be shown that

$$\begin{aligned} E\Vert P_{n}-P\Vert _{\mathcal {F}_{1}} \le 2E\Vert R_{n} \Vert _{\mathcal {F}_{1}}, \end{aligned}$$

and Talagrand’s inequality in Koltchinskii (2011) gives

$$\begin{aligned} P\left( \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \ge C\left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}+\sqrt{\frac{\sigma _{1}^2t}{n}}+\frac{t}{n}\right) \right) \le e^{-t}. \end{aligned}$$

That is, with probability $1-(nK)^{-C}$,

$$\begin{aligned} \Vert P_{n} -P\Vert _{\mathcal {F}_{1}} \le C \left( \sigma _{1}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) . \end{aligned}$$

It is easy to prove that $\sigma _{1}^{2} \le Cr$. Similarly, define the class of functions

$$\begin{aligned} \mathcal {F}_{2} = \{F(u)-F(0) :|u| \le r \}. \end{aligned}$$

Using the similar arguments, it can be shown that

$$\begin{aligned} N(\epsilon ,\mathcal {F}_{2},L_{2}(P_{n})) \le \frac{CnK\Vert F \Vert _{L_{2}(P_{n})}}{\epsilon }, \end{aligned}$$

and then with probability $1-(nK)^{-C}$, we have

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{n}\sum _{i=1}^{n} \frac{1}{K}\sum _{k=1}^{K} (F(u)-F(0))-E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le u\}\right) + E \left( \frac{1}{K}\sum _{k=1}^{K}I\{ \epsilon _{ik} \le 0\}\right) \right| \\&\quad \le C\left( \sigma _{2}\sqrt{\frac{\log (nK)}{n}}+\frac{\log (nK)}{n}\right) , \end{aligned}$$

where $\sigma _{2}^2 \le Cr^2$. Thus, with probability at least $1-(nK)^{-C}$,

$$\begin{aligned}&\underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik}\le 0 \} - F(u)+F(0)) \right| \\&\quad \le C\left( \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n}\right) . \end{aligned}$$

$\square $

Proof of Theorem 1

Step 1 Let ${\varvec{{\delta }}}= \check{{\varvec{{b}}}}-{\varvec{{b}}}_{0}$ and ${\varvec{{\Delta }}}= \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0}$. Since $\tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})$ is convex, we have

$$\begin{aligned} \tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \ge \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{\beta }}}-{\varvec{{\beta }}}_{0})+\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) ( {\varvec{{b}}}-{\varvec{{b}}}_{0}), \end{aligned}$$

for all ${\varvec{{b}}}$ and ${\varvec{{\beta }}}$. Using

$$\tilde{L}( \check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}}) + \lambda \Vert \check{{\varvec{{\beta }}}} \Vert _{1} \le \tilde{L}( {\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) + \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1},$$

we get

$$\begin{aligned}&-\Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\delta }}}\Vert _{1} -\Vert \nabla _{{\varvec{{\beta }}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \Vert _{\infty } \Vert {\varvec{{\Delta }}}\Vert _{1} \\&\quad \le \tilde{L}(\check{{\varvec{{b}}}},\check{{\varvec{{\beta }}}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1} - \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$

Under event

$$\begin{aligned} \mathcal {A}_{1} = \left\{ \Vert \nabla _{{\varvec{{b}}}} \tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le 3\lambda /(2K), \Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty } \le \lambda /2 \right\} , \end{aligned}$$

it leads to

$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}\Vert _{1} \le \lambda \Vert {\varvec{{\beta }}}_{0} \Vert _{1}- \lambda \Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}. \end{aligned}$$

Writing $\Vert {\varvec{{\Delta }}}\Vert _{1} = \Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}$, $\Vert {\varvec{{\beta }}}_{0} \Vert _{1} =\Vert {\varvec{{\beta }}}_{0S} \Vert _{1}$ and $\Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}=\Vert {\varvec{{\beta }}}_{0S} +{\varvec{{\Delta }}}_{S} \Vert _{1} + \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} $, we get

$$\begin{aligned} -\frac{3\lambda }{2K} \Vert {\varvec{{\delta }}}\Vert _{1}-\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} -\frac{\lambda }{2} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le \Vert {\varvec{{\Delta }}}_{S} \Vert _{1} - \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}. \end{aligned}$$

After rearranging,

$$\begin{aligned} \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+ \frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}. \end{aligned}$$

Similar to Lemma 3 of Gu and Zou (2020), it leads to

$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{1}) \ge 1-2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }. \end{aligned}$$

Step 2 It can be easily verified that

$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}). \end{aligned}$$

Define $\epsilon _{ik} = y_{i}-{\varvec{{x}}}_{i}^T{\varvec{{\beta }}}_{0}-b_{0k}$. Using Knight’s identity, we have

$$\begin{aligned} | x-y|- |x |=-y(I(x>0)-I(x<0))+2\int _{0}^{y}[I(x\le t)-I(x\le 0)]dt, \end{aligned}$$

which yields

$$\begin{aligned} \rho _{\tau }(x-y) - \rho _{\tau }(x)= -y(\tau -I \{ x \le 0 \}) + \int _{0}^{y} I (\{ x \le u\} - I\{x \le 0 \}) du. \end{aligned}$$

Then, it can be seen that

$$\begin{aligned}&\rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}({\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) -(b_{0k}+\delta _{k}))- \rho _{\tau _{k}}(y_{i}-{\varvec{{x}}}_{i}^{T}{\varvec{{\beta }}}_{0}-b_{0k})\\&\quad +\, {\varvec{{x}}}_{i}^T{\varvec{{\Delta }}}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})+\delta _{k}(\tau _{k}-I\{ \epsilon _{ik} \le 0 \})\\&\quad =\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} du. \end{aligned}$$

Thus, it leads to

$$\begin{aligned}&L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{\beta }}}_{0},{\varvec{{\beta }}}_{0})\\&\quad -\, EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+\, EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\\&\quad =\frac{1}{nK} \sum _{k=1}^{K}\sum _{i=1}^{n}\int _{0}^{{\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k}} I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0)du. \end{aligned}$$

Let

$$\begin{aligned} \mathcal {A}_{2}&= \left\{ \underset{| u| \le r}{\sup } \left| \frac{1}{nK}\sum _{k=1}^{K} \sum _{i=1}^{n} (I\{ \epsilon _{ik} \le u\} - I \{\epsilon _{ik} \le 0 \} - F(u)+F(0))\right| \right. \\&\left. \le \sqrt{\frac{r \log (nK)}{n}} +\frac{\log (nK)}{n} \right\} . \end{aligned}$$

Based on the proof of Lemma 1, we know that for $r >0$,

$$\begin{aligned} \mathrm {Pr}(\mathcal {A}_{2}) \ge 1-(nK)^{-C}. \end{aligned}$$

Using facts that $\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t$, $\max _{i} \Vert {\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k} \Vert _2 \le c_{n}\Vert {\varvec{{\Delta }}}\Vert _{1} +\Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\Vert {\varvec{{\Delta }}}_{S} \Vert _{1} +\left( 1+\frac{3}{K}\right) \Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 +(K+3)\Vert {\varvec{{\delta }}}\Vert _2\le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 + (K+3)(t-\Vert {\varvec{{\Delta }}}\Vert _2) \le 4c_{n}\sqrt{s}t$, we get

$$\begin{aligned}&\underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t} }{\sup }\left| L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-{\varvec{{\delta }}}^T\nabla _{{\varvec{{b}}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\right. \\&\qquad \left. -\, {\varvec{{\Delta }}}^T\nabla _{{\varvec{{\beta }}}} L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})-EL_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})+EL_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \right| \\&\quad \le \int _{0}^{4c_{n}\sqrt{s}t} \sqrt{\frac{r\log (nK)}{n} }+\frac{\log (nK)}{n}dr\\&\quad = C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$

Step 3 Step 1 implies

$$\begin{aligned} \underset{\underset{\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \le 3\Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1}}{\Vert {\varvec{{\Delta }}}\Vert +\Vert {\varvec{{\delta }}}\Vert \le t} }{\inf }\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})+\lambda \Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _{1}-\lambda \Vert {\varvec{{\beta }}}_{0}\Vert _{1}\le 0. \end{aligned}$$

We have $\Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _1-\Vert {\varvec{{\beta }}}_{0}\Vert _1\ge -\Vert {\varvec{{\Delta }}}_S\Vert _1\ge -\sqrt{s}\Vert {\varvec{{\Delta }}}_{S}\Vert _2\ge -\sqrt{s}t$. Furthermore, using Eq. (3.7) of Belloni and Chernozhukov (2011) and results from the previous steps to obtain the lower bound for $E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]$ below, we have

$$\begin{aligned}&\tilde{L}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}})-\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0}) \\&\quad \ge E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\\&\qquad -\, \Vert {\varvec{{\Delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{\beta }}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }-\Vert {\varvec{{\delta }}}\Vert _{1}\Vert \nabla _{{\varvec{{b}}}}\tilde{L}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})\Vert _{\infty }\\&\qquad -\, C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \\&\quad \ge C(t^2\wedge t) -C\lambda \sqrt{s}t- C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) . \end{aligned}$$

Thus, we have

$$\begin{aligned} C(t^2\wedge t) -C\lambda \sqrt{s}t - C\left( \frac{(c_{n}\sqrt{s}t)^{3/2} \sqrt{\log (nK)}}{\sqrt{n}} + \frac{(c_{n}\sqrt{s}t)\log (nK)}{n}\right) \le 0, \end{aligned}$$

and

$$\begin{aligned} t\le C\left( \lambda \sqrt{s}+\frac{c_{n}\sqrt{s}\log {(nK)}}{n}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) \le C\left( \lambda \sqrt{s}+\frac{s^{3/2} c_{n}^{2} {\log (nK)}}{n}\right) . \end{aligned}$$

Then, with probability at least

$$\begin{aligned}&\mathrm {Pr}(\mathcal {A}_{1}\bigcap \mathcal {A}_{2}) \ge 1- \mathrm {Pr}(\mathcal {A}_{1}^c)-\mathrm {Pr}(\mathcal {A}_{2}^c)\ge 1\\&\quad -\, 2K\exp {\left( -\frac{9N\lambda ^2}{2}\right) } -2p\exp {\left( -\frac{N\lambda ^2}{2M_{0}}\right) }-(nK)^{-C}, \end{aligned}$$

we have

$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert \le t. \end{aligned}$$

The second result is obtained by noting $\Vert {\varvec{{\Delta }}}\Vert _{1} \le 4\Vert {\varvec{{\Delta }}}_{S}\Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1} \le C \sqrt{s}t$, such that

$$\begin{aligned} \Vert \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0} \Vert _{1} \le C\sqrt{s}t\le C \left( \lambda s+\frac{s^{2}c_{n}^2 \log (nK)}{n}\right) . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Wang, L. Communication-efficient sparse composite quantile regression for distributed data. Metrika 86, 261–283 (2023). https://doi.org/10.1007/s00184-022-00868-z

Download citation

Received: 13 February 2021
Accepted: 23 May 2022
Published: 16 June 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00184-022-00868-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication-efficient sparse composite quantile regression for distributed data

Abstract

Access this article

Similar content being viewed by others

Finding global minima via kernel approximations

The Frank-Wolfe Algorithm: A Short Introduction

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Communication-efficient sparse composite quantile regression for distributed data

Abstract

Access this article

Similar content being viewed by others

Finding global minima via kernel approximations

The Frank-Wolfe Algorithm: A Short Introduction

Multi-consensus decentralized primal-dual fixed point algorithm for distributed learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation