Skip to main content
Log in

Fast algorithms for the quantile regression process

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (Stat Sci 12(4):279–300, 1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction in the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton–Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Among others, see Chapter 1 in Stigler (1986).

  2. See Koenker (2000) and Koenker (2017) for a more detailed historical account of the computation of median regression.

  3. See Koenker and Bassett (1978).

  4. On the other hand, note that quantile regression is not robust to outliers in the x-direction.

  5. Parametric programming is a technique for investigating the effects of a change in the parameters (here of the quantile index \(\tau \)) of the objective function.

  6. See, e.g., Chernozhukov et al. (2013).

  7. It is actually possible to use the estimates from the previous quantile regression as starting values for the next quantile regression. These better starting values allow for reducing the computing time and are, therefore, used by all our algorithms.

  8. In his comment of Portnoy and Koenker (1997), Thisted (1997) suggests this idea, which has never been implemented to the best of our knowledge.

  9. We start from the median regression in the simulations and application in Sects. 6 and 7.

  10. Schmidt and Zhu (2016) have suggested a different iterative estimation strategy. They also start from one quantile regression, but they add or subtract sums of nonnegative functions to it to calculate other quantiles. Their procedure has a different objective (monotonicity of the estimated conditional quantile function), and their estimator is not asymptotically equivalent to the traditional quantile regression estimator.

  11. A similar idea could be applied to adjust the constant m in Algorithm 2. The additional difficulty is that the optimal constant probably depends on the quantile index \(\tau \), which is not the case for the bootstrap.

  12. Algorithm 2 can be slightly improved by using preprocessing with \(\hat{\beta }(\tau _1)\) as a preliminary estimate of \(\hat{\beta }^{*b}(\tau _1)\) instead of computing it completely from scratch.

  13. We provide the results for the median regression, but the ranking was similar at other quantile indexes.

  14. To make the estimates comparable across quantiles and regressors, we first normalize them such that they have unit variance in the specification with \(n=50,000\). Then, we calculate the measures of performance separately for each parameter and average them over all quantile indices and regressors. The reported relative MSE and MAE are the averaged relative MSE and MAE. Alternatively, it is possible to calculate the ratio of the averaged MSE and MAE with the results in Table 2. These ratios of averages and averages of ratios are very similar.

  15. In small samples, they are both sensitive to the exact choice of the bandwidth. Pouliot (2020) gives simulation evidence of this sensitivity.

  16. See Chernozhukov and Fernández-Val (2005), Angrist et al. (2006), Chernozhukov and Hansen (2006) and Belloni et al. (2017) for more details and proofs of the validity of these procedures.

  17. Due to computational limitations, Abrevaya (2001) and Koenker and Hallock (2001) used only the June subsample to estimate 5, respectively, 15, different quantile regressions and they avoided bootstrapping the results. Of course, computers have become more powerful in the meantime.

  18. See the supplementary appendix SA to Chernozhukov et al. (2013) for the construction and the validity of the uniform bands.

  19. The largest p-value is 0.02 for high school.

  20. See Powell (1987) for the censored quantile regression estimator, Chernozhukov and Hansen (2006) for the instrumental variable quantile regression estimator and Kordas (2006) for the binary quantile regression estimator, which is a generalization of Manski (1975) maximum score estimator.

  21. Pouliot (2020) suggests a preprocessing algorithm for instrumental variable quantile regression.

References

  • Abrevaya J (2001) The effects of demographics and maternal behavior on the distribution of birth outcomes. Empir Econ 26(1):247–257

    Article  Google Scholar 

  • Angrist J, Chernozhukov V, Fernández-Val I (2006) Quantile regression under misspecification, with an application to the us wage structure. Econometrica 74:539–563

    Article  Google Scholar 

  • Baidal JAW, Locks LM, Cheng ER, Blake-Lamb TL, Perkins ME, Taveras EM (2016) Risk factors for childhood obesity in the first 1,000 days: a systematic review. Am J Prev Med 50(6):761–779

    Article  Google Scholar 

  • Barrodale I, Roberts F (1974) Solution of an overdetermined system of equations in the l 1 norm [f4]. Commun ACM 17(6):319–320

    Article  Google Scholar 

  • Belloni A, Chernozhukov V, Fernández-Val I, Hansen C (2017) Program evaluation and causal inference with high-dimensional data. Econometrica 85(1):233–298

    Article  Google Scholar 

  • Black SE, Devereux PJ, Salvanes KG (2007) From the cradle to the labor market? The effect of birth weight on adult outcomes. Q J Econ 122(1):409–439

    Article  Google Scholar 

  • Chernozhukov V, Fernández-Val I (2005) Subsampling inference on quantile regression processes. Sankhya Indian J Stat 67:253–276

    Google Scholar 

  • Chernozhukov V, Fernández-Val I (2011) Inference for extremal conditional quantile models, with an application to market and birthweight risks. Rev Econ Stud 78(2):559–589

    Article  Google Scholar 

  • Chernozhukov V, Hansen C (2006) Instrumental quantile regression inference for structural and treatment effect models. J Econom 132:491–525

    Article  Google Scholar 

  • Chernozhukov V, Fernández-Val I, Melly B (2013) Inference on counterfactual distributions. Econometrica 81(6):2205–2268

    Article  Google Scholar 

  • Fortin N, Lemieux T, Firpo S (2011) Decomposition methods in economics. In: Handbook of labor economics, vol 4. Elsevier, pp 1–102

  • Giné E, Zinn J et al (1984) Some limit theorems for empirical processes. Ann Probab 12(4):929–989

    Article  Google Scholar 

  • Hagemann A (2017) Cluster-robust bootstrap inference in quantile regression models. J Am Stat Assoc 112(517):446–456

    Article  Google Scholar 

  • Hahn J (1997) Bayesian bootstrap of the quantile regression estimator: a large sample study. Int Econ Rev 38:795–808

    Article  Google Scholar 

  • Hall P, Sheather SJ (1988) On the distribution of a studentized quantile. J R Stat Soc Ser B 50:381–391

    Google Scholar 

  • He F, Cheng Y, Tong T (2016) Estimation of extreme conditional quantiles through an extrapolation of intermediate regression quantiles. Stat Probab Lett 113:30–37

    Article  Google Scholar 

  • Kline P, Santos A (2012) A score based approach to wild bootstrap inference. J Econom Methods 1(1):23–41

    Article  Google Scholar 

  • Koenker R (2000) Galton, edgeworth, frisch, and prospects for quantile regression in econometrics. J Econom 95(2):347–374

    Article  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Koenker R (2017) Computational methods for quantile regression. In: Handbook of quantile regression, Chapman and Hall/CRC, pp 55–67

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    Article  Google Scholar 

  • Koenker R, d’Orey V (1994) Remark as r92: a remark on algorithm as 229: Computing dual regression quantiles and regression rank scores. J R Stat Soc Ser C (Appl Stat) 43(2):410–414

    Google Scholar 

  • Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15:143–156

    Article  Google Scholar 

  • Koenker R, Portnoy S (1987) L-estimation for linear models. J Am Stat Assoc 82(399):851–857

    Google Scholar 

  • Koenker R, Xiao Z (2002) Inference on the quantile regression process. Econometrica 70:1583–1612

    Article  Google Scholar 

  • Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of quantile regression. CRC Press, Boca Raton

    Book  Google Scholar 

  • Koenker RW, D’Orey V (1987) Algorithm as 229: computing regression quantiles. J R Stat Soc Ser C (Appl Stat) 36(3):383–393

    Google Scholar 

  • Kordas G (2006) Smoothed binary regression quantiles. J Appl Econom 21(3):387–407

    Article  Google Scholar 

  • Le Cam L (1956) On the asymptotic theory of estimation and testing hypotheses. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics. University of California Press, Berkeley: pp 129–156

  • Machado J, Mata J (2005) Counterfactual decomposition of changes in wage distributions using quantile regression. J Appl Econom 20:445–465

    Article  Google Scholar 

  • Mammen E et al (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21(1):255–285

    Article  Google Scholar 

  • Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3:205–228

    Article  Google Scholar 

  • Neocleous T, Portnoy S (2008) On monotonicity of regression quantile functions. Stat Probab Lett 78(10):1226–1229

    Article  Google Scholar 

  • Portnoy S (1991) Asymptotic behavior of the number of regression quantile breakpoints. SIAM J Sci Stat Comput 12(4):867–883

    Article  Google Scholar 

  • Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300

    Article  Google Scholar 

  • Pouliot GA (2020) Instrumental variables quantile regression with multivariate endogenous variable. Unpublished manuscript

  • Powell JL (1987) Semiparametric estimation of bivariate latent variable models. unpublished manuscript University of Wisconsin-Madison

  • Powell JL (1991) Estimation of monotonic regression models under quantile restrictions. Nonparametric and semiparametric methods in econometrics. Cambridge University Press, New York, pp 357–384

    Google Scholar 

  • Schmidt L, Zhu Y (2016) Quantile spacings: a simple method for the joint estimation of multiple quantiles without crossing. Available at SSRN 2220901

  • Stigler SM (1986) The history of statistics: the measurement of uncertainty before 1900. Harvard University Press, Cambridge

    Google Scholar 

  • Thisted RA (1997) The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimators: Comment. Statistical Science pp 296–298

  • van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Volgushev S, Chao SK, Cheng G et al (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662

    Article  Google Scholar 

  • Wang H, Li D, He X (2012) Estimation of high conditional quantiles for heavy-tailed distributions. J Am Stat Assoc 107:1453–1464

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the associate editor Roger Koenker, two anonymous referees and the participants to the conference “Economic Applications of Quantile Regressions 2.0” that took place at the Nova School of Business and Economics for useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Blaise Melly.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chernozhukov, V., Fernández-Val, I. & Melly, B. Fast algorithms for the quantile regression process. Empir Econ 62, 7–33 (2022). https://doi.org/10.1007/s00181-020-01898-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-020-01898-0

Keywords

Navigation