Fast algorithms for the quantile regression process

Chernozhukov, Victor; Fernández-Val, Iván; Melly, Blaise

doi:10.1007/s00181-020-01898-0

Fast algorithms for the quantile regression process

Published: 12 July 2020

Volume 62, pages 7–33, (2022)
Cite this article

Empirical Economics Aims and scope Submit manuscript

Victor Chernozhukov¹,
Iván Fernández-Val² &
Blaise Melly ORCID: orcid.org/0000-0001-9592-8749³

1435 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (Stat Sci 12(4):279–300, 1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction in the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton–Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

Notes

Among others, see Chapter 1 in Stigler (1986).
See Koenker (2000) and Koenker (2017) for a more detailed historical account of the computation of median regression.
See Koenker and Bassett (1978).
On the other hand, note that quantile regression is not robust to outliers in the x-direction.
Parametric programming is a technique for investigating the effects of a change in the parameters (here of the quantile index \(\tau \)) of the objective function.
See, e.g., Chernozhukov et al. (2013).
It is actually possible to use the estimates from the previous quantile regression as starting values for the next quantile regression. These better starting values allow for reducing the computing time and are, therefore, used by all our algorithms.
In his comment of Portnoy and Koenker (1997), Thisted (1997) suggests this idea, which has never been implemented to the best of our knowledge.
We start from the median regression in the simulations and application in Sects. 6 and 7.
Schmidt and Zhu (2016) have suggested a different iterative estimation strategy. They also start from one quantile regression, but they add or subtract sums of nonnegative functions to it to calculate other quantiles. Their procedure has a different objective (monotonicity of the estimated conditional quantile function), and their estimator is not asymptotically equivalent to the traditional quantile regression estimator.
A similar idea could be applied to adjust the constant m in Algorithm 2. The additional difficulty is that the optimal constant probably depends on the quantile index \(\tau \), which is not the case for the bootstrap.
Algorithm 2 can be slightly improved by using preprocessing with \(\hat{\beta }(\tau _1)\) as a preliminary estimate of \(\hat{\beta }^{*b}(\tau _1)\) instead of computing it completely from scratch.
We provide the results for the median regression, but the ranking was similar at other quantile indexes.
To make the estimates comparable across quantiles and regressors, we first normalize them such that they have unit variance in the specification with \(n=50,000\). Then, we calculate the measures of performance separately for each parameter and average them over all quantile indices and regressors. The reported relative MSE and MAE are the averaged relative MSE and MAE. Alternatively, it is possible to calculate the ratio of the averaged MSE and MAE with the results in Table 2. These ratios of averages and averages of ratios are very similar.
In small samples, they are both sensitive to the exact choice of the bandwidth. Pouliot (2020) gives simulation evidence of this sensitivity.
See Chernozhukov and Fernández-Val (2005), Angrist et al. (2006), Chernozhukov and Hansen (2006) and Belloni et al. (2017) for more details and proofs of the validity of these procedures.
Due to computational limitations, Abrevaya (2001) and Koenker and Hallock (2001) used only the June subsample to estimate 5, respectively, 15, different quantile regressions and they avoided bootstrapping the results. Of course, computers have become more powerful in the meantime.
See the supplementary appendix SA to Chernozhukov et al. (2013) for the construction and the validity of the uniform bands.
The largest p-value is 0.02 for high school.
See Powell (1987) for the censored quantile regression estimator, Chernozhukov and Hansen (2006) for the instrumental variable quantile regression estimator and Kordas (2006) for the binary quantile regression estimator, which is a generalization of Manski (1975) maximum score estimator.
Pouliot (2020) suggests a preprocessing algorithm for instrumental variable quantile regression.

References

Abrevaya J (2001) The effects of demographics and maternal behavior on the distribution of birth outcomes. Empir Econ 26(1):247–257
Article Google Scholar
Angrist J, Chernozhukov V, Fernández-Val I (2006) Quantile regression under misspecification, with an application to the us wage structure. Econometrica 74:539–563
Article Google Scholar
Baidal JAW, Locks LM, Cheng ER, Blake-Lamb TL, Perkins ME, Taveras EM (2016) Risk factors for childhood obesity in the first 1,000 days: a systematic review. Am J Prev Med 50(6):761–779
Article Google Scholar
Barrodale I, Roberts F (1974) Solution of an overdetermined system of equations in the l 1 norm [f4]. Commun ACM 17(6):319–320
Article Google Scholar
Belloni A, Chernozhukov V, Fernández-Val I, Hansen C (2017) Program evaluation and causal inference with high-dimensional data. Econometrica 85(1):233–298
Article Google Scholar
Black SE, Devereux PJ, Salvanes KG (2007) From the cradle to the labor market? The effect of birth weight on adult outcomes. Q J Econ 122(1):409–439
Article Google Scholar
Chernozhukov V, Fernández-Val I (2005) Subsampling inference on quantile regression processes. Sankhya Indian J Stat 67:253–276
Google Scholar
Chernozhukov V, Fernández-Val I (2011) Inference for extremal conditional quantile models, with an application to market and birthweight risks. Rev Econ Stud 78(2):559–589
Article Google Scholar
Chernozhukov V, Hansen C (2006) Instrumental quantile regression inference for structural and treatment effect models. J Econom 132:491–525
Article Google Scholar
Chernozhukov V, Fernández-Val I, Melly B (2013) Inference on counterfactual distributions. Econometrica 81(6):2205–2268
Article Google Scholar
Fortin N, Lemieux T, Firpo S (2011) Decomposition methods in economics. In: Handbook of labor economics, vol 4. Elsevier, pp 1–102
Giné E, Zinn J et al (1984) Some limit theorems for empirical processes. Ann Probab 12(4):929–989
Article Google Scholar
Hagemann A (2017) Cluster-robust bootstrap inference in quantile regression models. J Am Stat Assoc 112(517):446–456
Article Google Scholar
Hahn J (1997) Bayesian bootstrap of the quantile regression estimator: a large sample study. Int Econ Rev 38:795–808
Article Google Scholar
Hall P, Sheather SJ (1988) On the distribution of a studentized quantile. J R Stat Soc Ser B 50:381–391
Google Scholar
He F, Cheng Y, Tong T (2016) Estimation of extreme conditional quantiles through an extrapolation of intermediate regression quantiles. Stat Probab Lett 113:30–37
Article Google Scholar
Kline P, Santos A (2012) A score based approach to wild bootstrap inference. J Econom Methods 1(1):23–41
Article Google Scholar
Koenker R (2000) Galton, edgeworth, frisch, and prospects for quantile regression in econometrics. J Econom 95(2):347–374
Article Google Scholar
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Book Google Scholar
Koenker R (2017) Computational methods for quantile regression. In: Handbook of quantile regression, Chapman and Hall/CRC, pp 55–67
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Article Google Scholar
Koenker R, d’Orey V (1994) Remark as r92: a remark on algorithm as 229: Computing dual regression quantiles and regression rank scores. J R Stat Soc Ser C (Appl Stat) 43(2):410–414
Google Scholar
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15:143–156
Article Google Scholar
Koenker R, Portnoy S (1987) L-estimation for linear models. J Am Stat Assoc 82(399):851–857
Google Scholar
Koenker R, Xiao Z (2002) Inference on the quantile regression process. Econometrica 70:1583–1612
Article Google Scholar
Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of quantile regression. CRC Press, Boca Raton
Book Google Scholar
Koenker RW, D’Orey V (1987) Algorithm as 229: computing regression quantiles. J R Stat Soc Ser C (Appl Stat) 36(3):383–393
Google Scholar
Kordas G (2006) Smoothed binary regression quantiles. J Appl Econom 21(3):387–407
Article Google Scholar
Le Cam L (1956) On the asymptotic theory of estimation and testing hypotheses. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics. University of California Press, Berkeley: pp 129–156
Machado J, Mata J (2005) Counterfactual decomposition of changes in wage distributions using quantile regression. J Appl Econom 20:445–465
Article Google Scholar
Mammen E et al (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21(1):255–285
Article Google Scholar
Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3:205–228
Article Google Scholar
Neocleous T, Portnoy S (2008) On monotonicity of regression quantile functions. Stat Probab Lett 78(10):1226–1229
Article Google Scholar
Portnoy S (1991) Asymptotic behavior of the number of regression quantile breakpoints. SIAM J Sci Stat Comput 12(4):867–883
Article Google Scholar
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
Article Google Scholar
Pouliot GA (2020) Instrumental variables quantile regression with multivariate endogenous variable. Unpublished manuscript
Powell JL (1987) Semiparametric estimation of bivariate latent variable models. unpublished manuscript University of Wisconsin-Madison
Powell JL (1991) Estimation of monotonic regression models under quantile restrictions. Nonparametric and semiparametric methods in econometrics. Cambridge University Press, New York, pp 357–384
Google Scholar
Schmidt L, Zhu Y (2016) Quantile spacings: a simple method for the joint estimation of multiple quantiles without crossing. Available at SSRN 2220901
Stigler SM (1986) The history of statistics: the measurement of uncertainty before 1900. Harvard University Press, Cambridge
Google Scholar
Thisted RA (1997) The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimators: Comment. Statistical Science pp 296–298
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book Google Scholar
Volgushev S, Chao SK, Cheng G et al (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Article Google Scholar
Wang H, Li D, He X (2012) Estimation of high conditional quantiles for heavy-tailed distributions. J Am Stat Assoc 107:1453–1464
Article Google Scholar

Download references

Acknowledgements

We would like to thank the associate editor Roger Koenker, two anonymous referees and the participants to the conference “Economic Applications of Quantile Regressions 2.0” that took place at the Nova School of Business and Economics for useful comments.

Author information

Authors and Affiliations

Department of Economics, Massachusetts Institute of Technology, Cambridge, MA, USA
Victor Chernozhukov
Department of Economics, Boston University, Boston, MA, USA
Iván Fernández-Val
Department of Economics, University of Bern, Bern, Switzerland
Blaise Melly

Authors

Victor Chernozhukov
View author publications
You can also search for this author in PubMed Google Scholar
Iván Fernández-Val
View author publications
You can also search for this author in PubMed Google Scholar
Blaise Melly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Blaise Melly.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chernozhukov, V., Fernández-Val, I. & Melly, B. Fast algorithms for the quantile regression process. Empir Econ 62, 7–33 (2022). https://doi.org/10.1007/s00181-020-01898-0

Download citation

Received: 30 July 2019
Accepted: 04 June 2020
Published: 12 July 2020
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00181-020-01898-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast algorithms for the quantile regression process

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast algorithms for the quantile regression process

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation