Skip to main content
Log in

Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We consider the data are inherently distributed and focus on statistical learning in the presence of heavy-tailed and/or asymmetric errors. The composite quantile regression (CQR) estimator is a robust and efficient alternative to the ordinary least squares and single quantile regression estimators. Based on the aggregated and communication-efficient approaches, we propose two classes of sparse and debiased lasso CQR estimation and inference methods. Specifically, an aggregated \(\ell _1\)-penalized CQR estimator and a \(\ell _1\)-penalized communication-efficient CQR estimator are obtained firstly. To construct confidence intervals and make hypothesis testing, a unified debiasing framework based on smoothed decorrelated score equations is introduced to eliminate biases caused by lasso penalty. Finally, a hard-thresholding method is employed to ensure that the debiased lasso estimators are sparse. The convergence rates and asymptotic properties of the proposed estimators are established and their performance is evaluated through simulations and a real-world dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data are available from http://archive.ics.uci.edu/ml/datasets/communities+and+crime+unnormalized or from the authors upon request.

Code Availability

All the simulations are implemented in R codes, which are available from the authors upon request.

References

  • Battey H, Fan J, Liu H, Lu J, Zhu Z (2018) Distributed testing and estimation under sparse high dimensional models. Ann Stat 46(3):1352–1382

    Article  MathSciNet  Google Scholar 

  • Belloni A, Chernozhukov V (2011) \(\ell _1\)-penalized quantile regression in high-dimensional sparse models. Ann Stat 39(1):82–130

    Article  Google Scholar 

  • Bradic J, Kolar M (2017) Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores. arXiv preprint, arXiv:1702.06209

  • Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Ann Stat 47(6):3244–3273

    Article  MathSciNet  Google Scholar 

  • Chen X, Xie MG (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24(4):1655–1684

    MathSciNet  Google Scholar 

  • Cheng C, Feng X, Huang J, Liu X (2022) Regularized projection score estimation of treatment effects in high-dimensional quantile regression. Stat Sin 32(1):23–41

    MathSciNet  Google Scholar 

  • Di F, Wang L (2022) Multi-round smoothed composite quantile regression for distributed data. Ann Inst Stat Math 74:869–893

    Article  MathSciNet  Google Scholar 

  • Di F, Wang L, Lian H (2022) Communication-efficient estimation and inference for high-dimensional quantile regression based on smoothed decorrelated score. Stat Med 41(25):5084–5101

    Article  MathSciNet  Google Scholar 

  • Fan J, Guo Y, Wang K (2023) Communication efficient accurate statistical estimation. J Am Stat Assoc 118(542):1000–1010

    Article  MathSciNet  Google Scholar 

  • Fernandes M, Guerre E, Horta E (2021) Smoothing quantile regressions. J Bus Econ Stat 39(1):338–357

    Article  MathSciNet  Google Scholar 

  • Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66(11):7132–7154

    Article  MathSciNet  Google Scholar 

  • Han D, Huang J, Lin Y, Shen G (2022) Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors. J Econom 230(2):416–431

    Article  MathSciNet  Google Scholar 

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909

    MathSciNet  Google Scholar 

  • Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668–681

    Article  MathSciNet  Google Scholar 

  • Jiang R, Yu K (2021) Smoothing quantile regression for a distributed system. Neurocomputing 466:311–326

    Article  Google Scholar 

  • Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52(5):980–1004

    Article  MathSciNet  Google Scholar 

  • Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18(1):115–144

    MathSciNet  Google Scholar 

  • Moon H, Zhou WX (2022) High-dimensional composite quantile regression: optimal statistical guarantees and fast algorithms. arXiv preprint, arXiv:2208.09817

  • Ning Y, Liu H (2017) A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann Stat 45(1):158–195

    Article  MathSciNet  Google Scholar 

  • Tan KM, Wang L, Zhou WX (2021) High-dimensional quantile regression: convolution smoothing and concave regularization. arXiv preprint, arXiv:2109.05640

  • Van de Geer S, Bühlmann P, Ritov YA, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42(3):1166–1202

    MathSciNet  Google Scholar 

  • Volgushev S, Chao SK, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662

    Article  MathSciNet  Google Scholar 

  • Wang J, Kolar M, Srebro N, Zhang T (2017) Efficient distributed learning with sparsity. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 3636–3645

  • Wang K, Li S, Zhang B (2021) Robust communication-efficient distributed composite quantile regression and variable selection for massive data. Comput Stat Data Anal 161:107262

    Article  MathSciNet  Google Scholar 

  • Wang L, Lian H (2020) Communication-efficient estimation of high-dimensional quantile regression. Anal Appl 18(06):1057–1075

    Article  MathSciNet  Google Scholar 

  • Yang Y, Wang L (2023) Communication-efficient sparse composite quantile regression for distributed data. Metrika 86(3):261–283

    Article  MathSciNet  Google Scholar 

  • Zhang CG, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol 76(1):217–242

    Article  MathSciNet  Google Scholar 

  • Zhao T, Kolar M, Liu H (2014) A general framework for robust testing and confidence regions in high-dimensional quantile regression. arXiv preprint, arXiv:1412.8724

  • Zhao W, Zhang F, Lian H (2020) Debiasing and distributed estimation for high-dimensional quantile regression. IEEE Trans Neural Netw Learn Syst 31(7):2569–2577

    MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Editor, Associate Editor, and two anonymous referees for helpful comments and suggestions. Lei Wang’s research was supported by the Fundamental Research Funds for the Central Universities and the National Natural Science Foundation of China (12271272).

Funding

Wang’s research was supported by the National Natural Science Foundation of China (12271272), the Natural Science Foundation of Tianjin (18JCYBJC41100) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Contributions

ZH: investigation, formal analysis, software; WM: investigation, formal analysis; LW: methodology, formal analysis, supervision, writing.

Corresponding author

Correspondence to Lei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, Z., Ma, W. & Wang, L. Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data. TEST 32, 1230–1250 (2023). https://doi.org/10.1007/s11749-023-00875-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-023-00875-w

Keywords

Mathematics Subject Classification

Navigation