Logistic Quantile Regression for Bounded Outcomes Using a Family of Heavy-Tailed Distributions

Abstract

Mean regression model could be inadequate if the probability distribution of the observed responses is not symmetric. Under such situation, the quantile regression turns to be a more robust alternative for accommodating outliers and misspecification of the error distribution, since it characterizes the entire conditional distribution of the outcome variable. This paper proposes a robust logistic quantile regression model by using a logit link function along the EM-based algorithm for maximum likelihood estimation of the p th quantile regression parameters in Galarza (Stat 6, 1, 2017). The aforementioned quantile regression (QR) model is built on a generalized class of skewed distributions which consists of skewed versions of normal, Student’s t, Laplace, contaminated normal, slash, among other heavy-tailed distributions. We evaluate the performance of our proposal to accommodate bounded responses by investigating a synthetic dataset where we consider a full model including categorical and continuous covariates as well as several of its sub-models. For the full model, we compare our proposal with a non-parametric alternative from the so-called quantreg R package. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters, automatic selection of best model, as well as simulation of envelope plots which are useful for assessing the goodness-of-fit.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

References

  1. Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B.36, 99–102.

    MathSciNet  MATH  Google Scholar 

  2. Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-gaussian ornstein–uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B63, 167–241.

    MathSciNet  Article  Google Scholar 

  3. Barrodale, I. and Roberts, F. (1977). Algorithms for restricted least absolute value estimation. Communications in Statistics-Simulation and Computation6, 353–363.

    Article  Google Scholar 

  4. Bayes, C. L., Bazan, J. L. and De Castro, M. (2017). A quantile parametric mixed regression model for bounded response variables. Statistics and its Interface10, 483–493.

    Article  Google Scholar 

  5. Benites, L., Lachos, V. H. and Vilca, F. (2013). Likelihood based inference for quantile regression using the asymmetric Laplace distribution, Technical Report 15, Universidade Estadual de Campinas.

  6. Bottai, M., Cai, B. and McKeown, R. E. (2010). Logistic quantile regression for bounded outcomes. Statistics in Medicine29, 2, 309–317.

    MathSciNet  Google Scholar 

  7. Dempster, A., Laird, N. and Rubin, D (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B39, 1–38.

    MathSciNet  MATH  Google Scholar 

  8. Ferrari, S. and Cribari-Neto, F (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics31, 7, 799–815.

    MathSciNet  Article  Google Scholar 

  9. Galarza, C. E., Benites, L. and Lachos, V. H. (2015). lqr: Robust Linear Quantile Regression. R package version 1.5.

  10. Galarza, C., Lachos, V., Barbosa Cabral, C. and Castro Cepero, L. (2017). Robust quantile regression using a generalized class of skewed distributions. Stat 6, 1.

  11. Galvis, D. M., Bandyopadhyay, D. and Lachos, V.H (2014). Augmented mixed beta regression models for periodontal proportion data. Statistics in Medicine33, 21, 3759–3771.

    MathSciNet  Article  Google Scholar 

  12. Gómez-Déniz, E., Sordo, M. A. and Calderín-Ojeda, E. (2014). The log–lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: mathematics and Economics54, 49–57.

    MathSciNet  MATH  Google Scholar 

  13. Koenker, R.G. and Bassett, J. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society46, 33–50.

    MathSciNet  Article  Google Scholar 

  14. Koenker, R. W. and d’Orey, V. (1987). Algorithm as 229: Computing regression quantiles. Journal of the Royal Statistical Society. Series C (Applied Statistics)36, 3, 383–393.

    Google Scholar 

  15. Koenker, R. (2005). Quantile Regression, 38. Cambridge University Press, Cambridge.

    Google Scholar 

  16. Kottas, A. and Gelfand, A. E. (2001). Bayesian semiparametric median regression modeling. Journal of the American Statistical Association96, 1458–1468.

    MathSciNet  Article  Google Scholar 

  17. Kottas, A. and Krnjajić, M. (2009). Bayesian semiparametric modelling in quantile regression. Scandinavian Journal of Statistics36, 297–319.

    MathSciNet  Article  Google Scholar 

  18. Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology46, 1-2, 79–88.

    Article  Google Scholar 

  19. Liu, Y. and Wu, Y. (2009). Stepwise multiple quantile regression estimation using non-crossing constraints. Statistics and its Interface2, 3, 299–310.

    MathSciNet  Article  Google Scholar 

  20. Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints. Journal of Nonparametric Statistics23, 2, 415–437.

    MathSciNet  Article  Google Scholar 

  21. McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society. Series B (Methodological)42, 2, 109–142.

    MathSciNet  Article  Google Scholar 

  22. McCullagh, P. (1984). Generalized linear models. European Journal of Operational Research16, 3, 285–292.

    MathSciNet  Article  Google Scholar 

  23. McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall/CRC, London.

    Google Scholar 

  24. Mu, Y. and He, X (2007). Power transformation toward a linear regression quantile. Journal of the American Statistical Association102, 477, 269–279.

    MathSciNet  Article  Google Scholar 

  25. Paz, R. F. d. et al. (2017). Alternative regression models to beta distribution under bayesian approach.

  26. Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics32, 1, 143–155.

    MathSciNet  Article  Google Scholar 

  27. Tian, Y., Tian, M. and Zhu, Q. (2014). Linear Quantile Regression Based on EM Algorithm. Communications in Statistics - Theory and Methods43, 16, 3464–3484.

    MathSciNet  Article  Google Scholar 

  28. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 267–288.

  29. Verkuilen, J. and Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and Behavioral Statistics37, 1, 82–113.

    Article  Google Scholar 

  30. Wichitaksorn, N., Choy, S. and Gerlach, R. (2014). A generalized class of skew distributions and associated robust quantile regression models. Canadian Journal of Statistics42, 4, 579–596.

    MathSciNet  Article  Google Scholar 

  31. Yu, K. and Moyeed, R. (2001). Bayesian quantile regression. Statistics & Probability Letters54, 437–447.

    MathSciNet  Article  Google Scholar 

  32. Zhou, Y.-h., Ni, Z.-x. and Li, Y. (n.d). Quantile Regression via the EM Algorithm. Communications in Statistics - Simulation and Computation (10), 2162–2172.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Víctor H. Lachos.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Details of expectations in EM algorithm

The conditional distribution of the latent variable given the observed data \(f(u_{i}|y_{i},\boldsymbol {\theta }^{(k)})\) will depend on the functional form of h(ui|ν). Table 3 shows the conditional pdf of U given Y for specific choices of h(ui|ν).

Table 3 Conditional distribution of U given Y for specific SKD distributions

In Table 3, \(z_{i}= (y_{i}-\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p})/\sigma \) and \(\mathcal {F}(x|\alpha ,\lambda )\) represents the cdf of a Gamma (α,λ) distribution. Moreover, expressions for a and b are given by \(a = \nu \phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\gamma ^{-1}\sigma ^{2}}{4{\xi _{i}^{2}}}\right )\) and \(b = (1-\nu )\phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\sigma ^{2}}{4{\xi _{i}^{2}}}\right ).\) The notation TG(α,λ,t) represents a random variable with Gamma(α,λ) distribution truncated to the right at the value t. Finally, GIG(ν,a,b) denotes the Generalized Inverse Gaussian (GIG) distribution (see Barndorff-Nielsen and Shephard (2001) for more details).

Figure 10
figure10

Histograms of the residuals and fitted SKD densities for the median quantile regression model in Section 4.1. Plots provided by call the R Log.lqr function

Figure 11
figure11

An example of outputs from R lqr package for median quantile regression model in Section 4.1

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Galarza, C.E., Zhang, P. & Lachos, V.H. Logistic Quantile Regression for Bounded Outcomes Using a Family of Heavy-Tailed Distributions. Sankhya B (2020). https://doi.org/10.1007/s13571-020-00231-0

Download citation

Keywords and phrases.

  • Bounded outcomes
  • Quantile regression model
  • EM algorithm
  • Scale mixtures of Normal distributions

AMS (2000) subject classification.

  • Primary 62M10
  • Secondary 62E10.