Adaptive quantile regressions for massive datasets

Jiang, Rong; Chen, Wei-wei; Liu, Xin

doi:10.1007/s00362-020-01170-8

Adaptive quantile regressions for massive datasets

Regular Article
Published: 23 March 2020

Volume 62, pages 1981–1995, (2021)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Rong Jiang¹,
Wei-wei Chen¹ &
Xin Liu¹

457 Accesses
3 Citations
Explore all metrics

Abstract

Analysis of massive datasets is challenging owing to limitations of computer primary memory. Adaptive quantile regressions is a robust and efficient estimation method. For computational efficiency, we propose an adaptive smoothing quantile regressions (ASQR). The ASQR method is used to analyze massive datasets. The proposed approach significantly reduces the required amount of primary memory, and the resulting estimate will be as efficient as if the entire data set is analyzed simultaneously. Both simulations and data analysis are conducted to illustrate the finite sample performance of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

References

Bloznelis D, Claeskens G, Zhou J (2019) Composite versus model-averaged quantile regression. J Stat Plan Inference 200:32–46
Article MathSciNet Google Scholar
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24:1655–1684
MathSciNet MATH Google Scholar
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. arXiv:1810.08264
Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, New York
Book Google Scholar
Fan TH, Lin D, Cheng KF (2007) Regression analysis for massive datasets. Data Knowl Eng 61:554–562
Article Google Scholar
Jiang R, Hu X, Yu K, Qian W (2018) Composite quantile regression for massive datasets. Statistics 52:980–1004
Article MathSciNet Google Scholar
Jiang X, Li J, Xia T, Yan W (2016) Robust and efficient estimation with weighted composite quantile regression. Physica A 457:413–423
Article MathSciNet Google Scholar
Horowitz J (1998) Bootstrap methods for median regression models. Econometrica 66:1327–1351
Article MathSciNet Google Scholar
Koenker R (1984) A note on L-estimates for linear models. Stat Prob Lett 2:323–325
Article MathSciNet Google Scholar
Koenker R (2005) Quantile regression. Cambridge University Press, New York
Book Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet Google Scholar
Li R, Lin D, Li B (2013) Statistics inference in massive data sets. Appl Stoch Model Bus Ind 29:399–409
Google Scholar
Lin N, Xi R (2011) Aggregated estimating equation estimation. Stat Interface 4:73–83
Article MathSciNet Google Scholar
Pang L, Lu W, Wang H (2012) Variance estimation in censored quantile regression via induced smoothing. Comput Stat Data Anal 56:785–796
Article MathSciNet Google Scholar
Schifano ED, Wu J, Wang C, Yan J, Chen MH (2016) Online updating of statistical inference in the big data setting. Technometrics 58:393–403
Article MathSciNet Google Scholar
Tian Y, Zhu Q, Tian M (2016) Estimation of linear composite quantile regression using EM algorithm. Stat Prob Lett 117:183–191
Article MathSciNet Google Scholar
Xu Q, Cai C, Jiang C, Sun F, Huang X (2017) Block average quantile regression for massive dataset. Statistical Papers. https://doi.org/10.1007/s00362-017-0932-6
Article MATH Google Scholar
Yang K, Zhu L, Xu W (2018) Adaptive composite quantile regressions and their asymptotic relative efficiency. J Stat Comput Simul 88:900–919
Article MathSciNet Google Scholar
Zhao K, Lian H (2016) A note on the efficiency of composite quantile regression. J Stat Comput Simul 86:1334–1341
Article MathSciNet Google Scholar

Download references

Acknowledgements

The two anonymous referees provided numerous valuable comments that improve the manuscript. This research is supported by the Shanghai Sailing Program (No. 17YF1400800) and the National Natural Science Foundation of China (No. 11801069 and No.11871143).

Author information

Authors and Affiliations

Department of Statistics, College of Science, Donghua University, Shanghai, People’s Republic of China
Rong Jiang, Wei-wei Chen & Xin Liu

Authors

Rong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 2.1

By the Theorem 4.3 in Chen et al. (2019) and the error term $\varepsilon $ is independent of $\mathbf{X}$, we have

$$\begin{aligned} {\hat{\beta }}_{\tau _{q},h}-{\beta }_0=\frac{1}{N}f^{-1}(b_{\tau _q})\mathbf{C}^{-1}\sum _{i=1}^N\mathbf{x}_i \left( I\{\varepsilon _i\ge 0\}+ \tau _{k}-1 \right) +r_N, \end{aligned}$$

where $\Vert r_N\Vert _2=O_p\left( (p/N)^{3/4}(\log N)^{1/2}\right) $. Thus, for a given quantile $\tau _{q}$,

$$\begin{aligned} E[{\hat{\beta }}_{\tau _{q},h}]&=\beta _0+r_N,\\ Var[{\hat{\beta }}_{\tau _{q},h}]&=E[({\hat{\beta }}_{\tau _{q},h}-\beta _0)^2|\tau _{q}] =N^{-1}{} \mathbf{C}^{-1}\tau _{q}(1-\tau _{q})/f^2(b_{\tau _{q}})+r_N^2. \end{aligned}$$

See, for example, Koenker (2005) for details about the above two results. It is also straightforward to verify that, for two independent quantile $\tau _{q}$ and $\tau _{q'}$,

$$\begin{aligned} Cov({\hat{\beta }}_{\tau _{q},h},{\hat{\beta }}_{\tau _{q'},h}) =N^{-1}{} \mathbf{C}^{-1}(\min (\tau _{q},\tau _{q'})-\tau _{q}\tau _{q'})/\{f(b_{\tau _{q}}) f(b_{\tau _{q'}})\}+r_N^2. \end{aligned}$$

In addition, the distributions of ${\hat{\beta }}_{\tau _{q},h}$, $q=1,\ldots ,Q$ and the joint distributions of ${\hat{\beta }}_{\tau _{q},h}$, ${\hat{\beta }}_{\tau _{q'},h}$, $q,q'=1,\ldots ,Q$ are all enjoy asymptotic normality. Thus, ${\hat{\beta }}=\sum _{q=1}^Qw_q{\hat{\beta }}_{\tau _{q},h}$ also asymptotically follows normal distribution. The mean and variance of ${\hat{\beta }}$ are established as follows,

$$\begin{aligned} E[{\hat{\beta }}]&=\sum _{q=1}^Qw_qE[{\hat{\beta }}_{\tau _{q},h}]=\beta _0+r_N,\\ Var[{\hat{\beta }}]&=\sum _{q=1}^Q\sum _{q'=1}^Qw_qw_{q'}Cov({\hat{\beta }}_{\tau _{q},h}, {\hat{\beta }}_{\tau _{q'},h})\\&=N^{-1}{} \mathbf{C}^{-1}\sum _{q=1}^Q\sum _{q'=1}^Qw_qw_{q'} \frac{\min (\tau _{q},\tau _{q'})-\tau _{q}\tau _{q'}}{f(b_{\tau _{q}})f(b_{\tau _{q'}})} +r_N^2. \end{aligned}$$

Then, the density function of ${\hat{\beta }}$ is as follows,

$$\begin{aligned} f({\hat{\beta }})\rightarrow \left\{ 2\pi N^{-1}\det \left( \varSigma (\mathbf{w})\right) \right\} ^{-N/2} \exp \left\{ -1/2N^{-1}(\hat{{\beta }}-{\beta }_0)^{\top }\varSigma ^{-1}(\mathbf{w})(\hat{{\beta }}-{\beta }_0) \right\} . \end{aligned}$$

Thus, under conditions in Theorem 2.1,

$$\begin{aligned} \sqrt{N}(\hat{{\beta }}-{\beta }_0)\xrightarrow {L} \left( 0,\varSigma (\mathbf{w})\right) . \end{aligned}$$

This completes the proof of Theorem 2.1. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, R., Chen, Ww. & Liu, X. Adaptive quantile regressions for massive datasets. Stat Papers 62, 1981–1995 (2021). https://doi.org/10.1007/s00362-020-01170-8

Download citation

Received: 11 July 2019
Revised: 15 January 2020
Published: 23 March 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00362-020-01170-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive quantile regressions for massive datasets

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Entropy-Based Subsampling Methods for Big Data

A Guide for Sparse PCA: Model Comparison and Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 2.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive quantile regressions for massive datasets

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Entropy-Based Subsampling Methods for Big Data

A Guide for Sparse PCA: Model Comparison and Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 2.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation