Abstract
Composite quantile regression (CQR) estimator is a robust and efficient alternative to the M-estimator and ordinary quantile regression estimator in linear models. In order to construct sparse CQR estimation in the presence of distributed data, we propose a penalized communication-efficient surrogate loss function that is computationally superior to the original global loss function. The proposed method only needs the worker machines to compute the gradient based on local data without a penalty and the central machine to solve a regular estimation problem. We prove that the estimation errors based on the proposed method match the estimation error bound of the centralized method by analyzing the entire data set simultaneously. A modified alternating direction method of multipliers algorithm is developed to efficiently obtain the sparse CQR estimator. The performance of the proposed estimator is studied through simulation, and an application to a real data set is also presented.
Similar content being viewed by others
References
Belloni A, Chernozhukov V (2011) L1-penalized quantile regression in highdimensional sparse models. Ann Stat 39:82–130
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica 24:1655–1684
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202
Fan J, Guo Y, Wang K (2021) Communication-efficient accurate statistical estimation. J Am Stat Assoc, to appear
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66:7132–7154
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909
Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681
Kai B, Li R, Zou H (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc Ser B Stat Methodol 72:49–69
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39:305–332
Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46:33–50
Koenker R, Ng P (2005) Inequality constrained quantile regression. Sankhya Indian J Stat 67:418–440
Koltchinskii V (2011) Oracle inequalities in empirical risk minimization and sparse recovery problems. Springer, New York
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18:115–144
van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202
van Der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Volgushev S, Chao SK, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47:1634–1662
Wang L, Lian H (2020) Communication-efficient estimation of high-dimensional quantile regression. Anal Appl 18:1057–1075
Zhang C, Zhang S (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol 76:217–242
Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14:3321–3363
Zhang Y, Duchi JC, Wainwright MJ (2015) Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J Mach Learn Res 16:3299–3340
Zhao K, Lian H (2016) A note on the efficiency of composite quantile regression. J Stat Comput Simul 86:1334–1341
Zhao W, Zhang F, Lian H (2020) Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Trans Neural Netw Learn Syst 31:2569–2577
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36:1108–1126
Acknowledgements
We are grateful to the Editor, an associate editor and one anonymous referee for their insightful comments and suggestions, which have led to significant improvements. This article was supported by the National Natural Science Foundation of China [Grant Nos. 11871287, 11771144, 11801359], the Natural Science Foundation of Tianjin [Grant No. 18JCYBJC41100], Fundamental Research Funds for the Central Universities [ZB22000102] and the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
Under conditions (C1)–(C5), with probability at least \(1-(nK)^{-C}\),
Proof of Lemma 1
Firstly, we write
Define the class of functions
with envelope function \(\mathcal {F}({\varvec{{x}}},y) = 1\). By Lemma 2.6.15 and Lemma 2.6.18 in van Der Vaart and Wellner (1996), \(\mathcal {F}_{1}\) is a \(Vapnik-\breve{C}ervonenkis\) (or simply VC)-subgraph. By Theorem 2.6.7 of van Der Vaart and Wellner (1996), we have
Since u can take at most nK different values,
Let \(\sigma _{1}^2 = \sup _{f\in \mathcal {F}} Pf^2\). Then by Theorem 3.12 of Koltchinskii (2011), with \(\Vert F \Vert _{L_{2}(P)}\) obviously bounded by a constant, we have
where \(\Vert R_{n}\Vert _{\mathcal {F}_{1}} = \sup _{f\in \mathcal {F}_{1}} n^{-1}\sum _{i=1}^{n}\epsilon _{i}f({\varvec{{x}}}_{i},y_{i})\) with \(\epsilon _{i}\) being i.i.d Rademacher random variables. Using the symmetrization inequality, it can be shown that
and Talagrand’s inequality in Koltchinskii (2011) gives
That is, with probability \(1-(nK)^{-C}\),
It is easy to prove that \(\sigma _{1}^{2} \le Cr\). Similarly, define the class of functions
Using the similar arguments, it can be shown that
and then with probability \(1-(nK)^{-C}\), we have
where \(\sigma _{2}^2 \le Cr^2\). Thus, with probability at least \(1-(nK)^{-C}\),
\(\square \)
Proof of Theorem 1
Step 1 Let \({\varvec{{\delta }}}= \check{{\varvec{{b}}}}-{\varvec{{b}}}_{0}\) and \({\varvec{{\Delta }}}= \check{{\varvec{{\beta }}}}-{\varvec{{\beta }}}_{0}\). Since \(\tilde{L}({\varvec{{b}}},{\varvec{{\beta }}})\) is convex, we have
for all \({\varvec{{b}}}\) and \({\varvec{{\beta }}}\). Using
we get
Under event
it leads to
Writing \(\Vert {\varvec{{\Delta }}}\Vert _{1} = \Vert {\varvec{{\Delta }}}_{S} \Vert _{1}+\Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1}\), \(\Vert {\varvec{{\beta }}}_{0} \Vert _{1} =\Vert {\varvec{{\beta }}}_{0S} \Vert _{1}\) and \(\Vert {\varvec{{\beta }}}_{0} +{\varvec{{\Delta }}}\Vert _{1}=\Vert {\varvec{{\beta }}}_{0S} +{\varvec{{\Delta }}}_{S} \Vert _{1} + \Vert {\varvec{{\Delta }}}_{S^{c}} \Vert _{1} \), we get
After rearranging,
Similar to Lemma 3 of Gu and Zou (2020), it leads to
Step 2 It can be easily verified that
Define \(\epsilon _{ik} = y_{i}-{\varvec{{x}}}_{i}^T{\varvec{{\beta }}}_{0}-b_{0k}\). Using Knight’s identity, we have
which yields
Then, it can be seen that
Thus, it leads to
Let
Based on the proof of Lemma 1, we know that for \(r >0\),
Using facts that \(\Vert {\varvec{{\Delta }}}\Vert _2 +\Vert {\varvec{{\delta }}}\Vert _2 \le t\), \(\max _{i} \Vert {\varvec{{x}}}_{i}^{T}{\varvec{{\Delta }}}+\delta _{k} \Vert _2 \le c_{n}\Vert {\varvec{{\Delta }}}\Vert _{1} +\Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\Vert {\varvec{{\Delta }}}_{S} \Vert _{1} +\left( 1+\frac{3}{K}\right) \Vert {\varvec{{\delta }}}\Vert _{1} \le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 +(K+3)\Vert {\varvec{{\delta }}}\Vert _2\le 4c_{n}\sqrt{s} \Vert {\varvec{{\Delta }}}\Vert _2 + (K+3)(t-\Vert {\varvec{{\Delta }}}\Vert _2) \le 4c_{n}\sqrt{s}t\), we get
Step 3 Step 1 implies
We have \(\Vert {\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}\Vert _1-\Vert {\varvec{{\beta }}}_{0}\Vert _1\ge -\Vert {\varvec{{\Delta }}}_S\Vert _1\ge -\sqrt{s}\Vert {\varvec{{\Delta }}}_{S}\Vert _2\ge -\sqrt{s}t\). Furthermore, using Eq. (3.7) of Belloni and Chernozhukov (2011) and results from the previous steps to obtain the lower bound for \(E[ L_{1}({\varvec{{b}}}_{0}+{\varvec{{\delta }}},{\varvec{{\beta }}}_{0}+{\varvec{{\Delta }}}) ]-E[ L_{1}({\varvec{{b}}}_{0},{\varvec{{\beta }}}_{0})]\) below, we have
Thus, we have
and
Then, with probability at least
we have
The second result is obtained by noting \(\Vert {\varvec{{\Delta }}}\Vert _{1} \le 4\Vert {\varvec{{\Delta }}}_{S}\Vert _{1}+\frac{3}{K}\Vert {\varvec{{\delta }}}\Vert _{1} \le C \sqrt{s}t\), such that
\(\square \)
Rights and permissions
About this article
Cite this article
Yang, Y., Wang, L. Communication-efficient sparse composite quantile regression for distributed data. Metrika 86, 261–283 (2023). https://doi.org/10.1007/s00184-022-00868-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-022-00868-z