Abstract
We consider the data are inherently distributed and focus on statistical learning in the presence of heavy-tailed and/or asymmetric errors. The composite quantile regression (CQR) estimator is a robust and efficient alternative to the ordinary least squares and single quantile regression estimators. Based on the aggregated and communication-efficient approaches, we propose two classes of sparse and debiased lasso CQR estimation and inference methods. Specifically, an aggregated \(\ell _1\)-penalized CQR estimator and a \(\ell _1\)-penalized communication-efficient CQR estimator are obtained firstly. To construct confidence intervals and make hypothesis testing, a unified debiasing framework based on smoothed decorrelated score equations is introduced to eliminate biases caused by lasso penalty. Finally, a hard-thresholding method is employed to ensure that the debiased lasso estimators are sparse. The convergence rates and asymptotic properties of the proposed estimators are established and their performance is evaluated through simulations and a real-world dataset.
Similar content being viewed by others
Data availability
The data are available from http://archive.ics.uci.edu/ml/datasets/communities+and+crime+unnormalized or from the authors upon request.
Code Availability
All the simulations are implemented in R codes, which are available from the authors upon request.
References
Battey H, Fan J, Liu H, Lu J, Zhu Z (2018) Distributed testing and estimation under sparse high dimensional models. Ann Stat 46(3):1352–1382
Belloni A, Chernozhukov V (2011) \(\ell _1\)-penalized quantile regression in high-dimensional sparse models. Ann Stat 39(1):82–130
Bradic J, Kolar M (2017) Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores. arXiv preprint, arXiv:1702.06209
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Ann Stat 47(6):3244–3273
Chen X, Xie MG (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24(4):1655–1684
Cheng C, Feng X, Huang J, Liu X (2022) Regularized projection score estimation of treatment effects in high-dimensional quantile regression. Stat Sin 32(1):23–41
Di F, Wang L (2022) Multi-round smoothed composite quantile regression for distributed data. Ann Inst Stat Math 74:869–893
Di F, Wang L, Lian H (2022) Communication-efficient estimation and inference for high-dimensional quantile regression based on smoothed decorrelated score. Stat Med 41(25):5084–5101
Fan J, Guo Y, Wang K (2023) Communication efficient accurate statistical estimation. J Am Stat Assoc 118(542):1000–1010
Fernandes M, Guerre E, Horta E (2021) Smoothing quantile regressions. J Bus Econ Stat 39(1):338–357
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66(11):7132–7154
Han D, Huang J, Lin Y, Shen G (2022) Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors. J Econom 230(2):416–431
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681
Jiang R, Yu K (2021) Smoothing quantile regression for a distributed system. Neurocomputing 466:311–326
Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52(5):980–1004
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18(1):115–144
Moon H, Zhou WX (2022) High-dimensional composite quantile regression: optimal statistical guarantees and fast algorithms. arXiv preprint, arXiv:2208.09817
Ning Y, Liu H (2017) A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Stat 45(1):158–195
Tan KM, Wang L, Zhou WX (2021) High-dimensional quantile regression: convolution smoothing and concave regularization. arXiv preprint, arXiv:2109.05640
Van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42(3):1166–1202
Volgushev S, Chao SK, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Wang J, Kolar M, Srebro N, Zhang T (2017) Efficient distributed learning with sparsity. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 3636–3645
Wang K, Li S, Zhang B (2021) Robust communication-efficient distributed composite quantile regression and variable selection for massive data. Comput Stat Data Anal 161:107262
Wang L, Lian H (2020) Communication-efficient estimation of high-dimensional quantile regression. Anal Appl 18(06):1057–1075
Yang Y, Wang L (2023) Communication-efficient sparse composite quantile regression for distributed data. Metrika 86(3):261–283
Zhang CG, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol 76(1):217–242
Zhao T, Kolar M, Liu H (2014) A general framework for robust testing and confidence regions in high-dimensional quantile regression. arXiv preprint, arXiv:1412.8724
Zhao W, Zhang F, Lian H (2020) Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Trans Neural Netw Learn Syst 31(7):2569–2577
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126
Acknowledgements
The authors would like to thank Editor, Associate Editor, and two anonymous referees for helpful comments and suggestions. Lei Wang’s research was supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China (12271272).
Funding
Wang’s research was supported by the National Natural Science Foundation of China (12271272), the Natural Science Foundation of Tianjin (18JCYBJC41100) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Contributions
ZH: investigation, formal analysis, software; WM: investigation, formal analysis; LW: methodology, formal analysis, supervision, writing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hou, Z., Ma, W. & Wang, L. Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data. TEST 32, 1230–1250 (2023). https://doi.org/10.1007/s11749-023-00875-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-023-00875-w