Abstract
Mean regression model could be inadequate if the probability distribution of the observed responses is not symmetric. Under such situation, the quantile regression turns to be a more robust alternative for accommodating outliers and misspecification of the error distribution, since it characterizes the entire conditional distribution of the outcome variable. This paper proposes a robust logistic quantile regression model by using a logit link function along the EM-based algorithm for maximum likelihood estimation of the p th quantile regression parameters in Galarza (Stat 6, 1, 2017). The aforementioned quantile regression (QR) model is built on a generalized class of skewed distributions which consists of skewed versions of normal, Student’s t, Laplace, contaminated normal, slash, among other heavy-tailed distributions. We evaluate the performance of our proposal to accommodate bounded responses by investigating a synthetic dataset where we consider a full model including categorical and continuous covariates as well as several of its sub-models. For the full model, we compare our proposal with a non-parametric alternative from the so-called quantreg R package. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters, automatic selection of best model, as well as simulation of envelope plots which are useful for assessing the goodness-of-fit.
This is a preview of subscription content, log in to check access.









References
Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B.36, 99–102.
Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-gaussian ornstein–uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B63, 167–241.
Barrodale, I. and Roberts, F. (1977). Algorithms for restricted least absolute value estimation. Communications in Statistics-Simulation and Computation6, 353–363.
Bayes, C. L., Bazan, J. L. and De Castro, M. (2017). A quantile parametric mixed regression model for bounded response variables. Statistics and its Interface10, 483–493.
Benites, L., Lachos, V. H. and Vilca, F. (2013). Likelihood based inference for quantile regression using the asymmetric Laplace distribution, Technical Report 15, Universidade Estadual de Campinas.
Bottai, M., Cai, B. and McKeown, R. E. (2010). Logistic quantile regression for bounded outcomes. Statistics in Medicine29, 2, 309–317.
Dempster, A., Laird, N. and Rubin, D (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B39, 1–38.
Ferrari, S. and Cribari-Neto, F (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics31, 7, 799–815.
Galarza, C. E., Benites, L. and Lachos, V. H. (2015). lqr: Robust Linear Quantile Regression. R package version 1.5.
Galarza, C., Lachos, V., Barbosa Cabral, C. and Castro Cepero, L. (2017). Robust quantile regression using a generalized class of skewed distributions. Stat 6, 1.
Galvis, D. M., Bandyopadhyay, D. and Lachos, V.H (2014). Augmented mixed beta regression models for periodontal proportion data. Statistics in Medicine33, 21, 3759–3771.
Gómez-Déniz, E., Sordo, M. A. and Calderín-Ojeda, E. (2014). The log–lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: mathematics and Economics54, 49–57.
Koenker, R.G. and Bassett, J. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society46, 33–50.
Koenker, R. W. and d’Orey, V. (1987). Algorithm as 229: Computing regression quantiles. Journal of the Royal Statistical Society. Series C (Applied Statistics)36, 3, 383–393.
Koenker, R. (2005). Quantile Regression, 38. Cambridge University Press, Cambridge.
Kottas, A. and Gelfand, A. E. (2001). Bayesian semiparametric median regression modeling. Journal of the American Statistical Association96, 1458–1468.
Kottas, A. and Krnjajić, M. (2009). Bayesian semiparametric modelling in quantile regression. Scandinavian Journal of Statistics36, 297–319.
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology46, 1-2, 79–88.
Liu, Y. and Wu, Y. (2009). Stepwise multiple quantile regression estimation using non-crossing constraints. Statistics and its Interface2, 3, 299–310.
Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints. Journal of Nonparametric Statistics23, 2, 415–437.
McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society. Series B (Methodological)42, 2, 109–142.
McCullagh, P. (1984). Generalized linear models. European Journal of Operational Research16, 3, 285–292.
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall/CRC, London.
Mu, Y. and He, X (2007). Power transformation toward a linear regression quantile. Journal of the American Statistical Association102, 477, 269–279.
Paz, R. F. d. et al. (2017). Alternative regression models to beta distribution under bayesian approach.
Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics32, 1, 143–155.
Tian, Y., Tian, M. and Zhu, Q. (2014). Linear Quantile Regression Based on EM Algorithm. Communications in Statistics - Theory and Methods43, 16, 3464–3484.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 267–288.
Verkuilen, J. and Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and Behavioral Statistics37, 1, 82–113.
Wichitaksorn, N., Choy, S. and Gerlach, R. (2014). A generalized class of skew distributions and associated robust quantile regression models. Canadian Journal of Statistics42, 4, 579–596.
Yu, K. and Moyeed, R. (2001). Bayesian quantile regression. Statistics & Probability Letters54, 437–447.
Zhou, Y.-h., Ni, Z.-x. and Li, Y. (n.d). Quantile Regression via the EM Algorithm. Communications in Statistics - Simulation and Computation (10), 2162–2172.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Details of expectations in EM algorithm
The conditional distribution of the latent variable given the observed data \(f(u_{i}|y_{i},\boldsymbol {\theta }^{(k)})\) will depend on the functional form of h(ui|ν). Table 3 shows the conditional pdf of U given Y for specific choices of h(ui|ν).
In Table 3, \(z_{i}= (y_{i}-\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p})/\sigma \) and \(\mathcal {F}(x|\alpha ,\lambda )\) represents the cdf of a Gamma (α,λ) distribution. Moreover, expressions for a and b are given by \(a = \nu \phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\gamma ^{-1}\sigma ^{2}}{4{\xi _{i}^{2}}}\right )\) and \(b = (1-\nu )\phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\sigma ^{2}}{4{\xi _{i}^{2}}}\right ).\) The notation TG(α,λ,t) represents a random variable with Gamma(α,λ) distribution truncated to the right at the value t. Finally, GIG(ν,a,b) denotes the Generalized Inverse Gaussian (GIG) distribution (see Barndorff-Nielsen and Shephard (2001) for more details).
Histograms of the residuals and fitted SKD densities for the median quantile regression model in Section 4.1. Plots provided by call the R Log.lqr function
An example of outputs from R lqr package for median quantile regression model in Section 4.1
Rights and permissions
About this article
Cite this article
Galarza, C.E., Zhang, P. & Lachos, V.H. Logistic Quantile Regression for Bounded Outcomes Using a Family of Heavy-Tailed Distributions. Sankhya B (2020). https://doi.org/10.1007/s13571-020-00231-0
Received:
Published:
Keywords and phrases.
- Bounded outcomes
- Quantile regression model
- EM algorithm
- Scale mixtures of Normal distributions
AMS (2000) subject classification.
- Primary 62M10
- Secondary 62E10.

