Abstract
In this paper we define a finite mixture of quantile and M-quantile regression models for heterogeneous and /or for dependent/clustered data. Components of the finite mixture represent clusters of individuals with homogeneous values of model parameters. For its flexibility and ease of estimation, the proposed approaches can be extended to random coefficients with a higher dimension than the simple random intercept case. Estimation of model parameters is obtained through maximum likelihood, by implementing an EM-type algorithm. The standard error estimates for model parameters are obtained using the inverse of the observed information matrix, derived through the Oakes (J R Stat Soc Ser B 61:479–482, 1999) formula in the M-quantile setting, and through nonparametric bootstrap in the quantile case. We present a large scale simulation study to analyse the practical behaviour of the proposed model and to evaluate the empirical performance of the proposed standard error estimates for model parameters. We considered a variety of empirical settings in both the random intercept and the random coefficient case. The proposed modelling approaches are also applied to two well-known datasets which give further insights on their empirical behaviour.
Similar content being viewed by others
References
Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. National Bureau of standards, Washington, DC (1964)
Aitkin, M.: A general maximum likelihood analysis of overdispersion in generalized linear models. Stat. Comput. 6, 127–130 (1996)
Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117–128 (1999)
Aitkin, M., Francis, B., Hinde, J.: Statistical Modelling in GLIM, 2nd edn. Oxford University Press, Oxford (2005)
Alfó, M., Trovato, G.: Semiparametric mixture models for multivariate count data, with application. Econom. J. 7, 426–454 (2004)
Bianchi, A., Fabrizi, E., Salvati, N., Tzavidis, N.: M-quantile regression: diagnostics and parametric representation of the model. Working paper. http://www.sp.unipg.it/surwey/dowload/publications/24-mq-diagn.html (2015)
Bianchi, A., Salvati, N.: Asymptotic properties and variance estimators of the M-quantile regression coefficients estimators. Commun. Stat. 44, 2416–2429 (2015)
Breckling, J., Chambers, R.: \({M}\)-quantiles. Biometrika 75, 761–771 (1988)
Cantoni, E., Ronchetti, E.: Robust inference for generalized linear models. J. Am. Stat. Assoc. 96, 1022–1030 (2001)
Davis, C.: Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. Stat. Med. 10, 1959–1980 (1991)
DeSarbo, W., Cron, W.: A maximum likelihood methodology for clusterwise regression. J. Classif. 5, 249–282 (1988)
Farcomeni, A.: Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 22, 141–152 (2012)
Follmann, D., Lambert, D.: Generalizing logistic regression by nonparametric mixing. J. Am. Stat. Assoc. 84, 295–300 (1989)
Friedl, H., Kauermann, G.: Standard errors for EM estimates in generalized linear models with random effects. Biometrics 56, 761–767 (2000)
Geraci, M., Bottai, M.: Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154 (2007)
Geraci, M., Bottai, M.: Linear quantile mixed models. Stat. Comput. 24, 461–479 (2014)
Geyer, C., Thompson, E.: Constrained Monte Carlo maximum likelihood for dependent data. J. R. Stat. Soc. B 54, 657–699 (1992)
Gueorguieva, R.: A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family. Stat. Model. 1, 177–193 (2001)
Hennig, C.: Identifiability of models for clusterwise linear regression. J. Classif. 17, 273–296 (2000)
Huber, P.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)
Huber, P.: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat. 1, 799–821 (1973)
Huber, P. J.: The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 221–233. Wiley, Amsterdam (1967)
Huber, P.J.: Robust Statistics. Wiley, Hoboken (1981)
Jank, W., Booth, J.: Efficiency of Monte Carlo EM and simulated maximum likelihood in two-stage hierarchical models. J. Comput. Graph. Stat. 12, 214–229 (2003)
Jones, M.C.: Expectiles and m-quantiles are quantiles. Stat. Probab. Lett. 20, 149–153 (1994)
Jung, S.: Quasi-likelihood for median regression models. J. Am. Stat. Assoc. 91, 251–257 (1996)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Koenker, R., D’Orey, V.: Computing regression quantiles. Biometrika 93, 255–268 (1987)
Kokic, P., Chambers, R., Breckling, J., Beare, S.: A measure of production performance. J. Bus. Econ. Stat. 10, 419–435 (1997)
Laird, N.M.: Nonparametric maximum likelihood estimation of a mixing distribution. J. Am. Stat. Assoc. 73, 805–811 (1978)
Liu, Q., Pierce, D.: A note on Gaussian–Hermite quadrature. Biometrika 81, 624–629 (1994)
Liu, Y., Bottai, M.: Mixed-effects models for conditional quantiles with longitudinal data. Int. J. Biostat. 5, 1–22 (2009)
Louis, T.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–233 (1982)
McCulloch, C.: Maximum likelihood estimation of variance components for binary data. J. Am. Stat. Assoc. 89, 330–335 (1994)
Munkin, M.K., Trivedi, P.K.: Simulated maximum likelihood estimation of multivariate mixed-Poisson regression models, with application. Econom. J. 2, 29–48 (1999)
Newey, W., Powell, J.: Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987)
Oakes, D.: Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B 61, 479–482 (1999)
Pinheiro, J., Bates, D.: Approximations to the log-likelihood function in the nonlinear mixed-effects model. J. Comput. Graph. Stat. 4, 12–35 (1995)
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (2007)
Street, J., Carroll, R., Ruppert, D.: A note on computing robust regression estimates via iteratively reweighed least squares. Am. Stat. 42, 152–154 (1988)
Treatment of Lead-Exposed Children (TLC) Trial Group: Safety and efficacy of succimer in toddlers with blood lead levels of 20–44 \(\mu {\rm g/dl}\). Pediatr. Res. 48, 593–599 (2000)
Tzavidis, N., Salvati, N., Schmid, T., Flouri, E., Midouhas, E.: Longitudinal analysis of the Strengths and Difficulties Questionnaire scores of the Millennium Cohort Study children in England using M-quantile random effects regression. J. R. Stat. Soc. A. 179, 427–452 (2016)
Wang, P., Puterman, M., Cockburn, I., Le, N.: Mixed Poisson regression models with covariate dependent rates. Biometrics 52, 381–400 (1996)
Wang, Y., Lin, X., Zhu, M., Bai, Z.: Robust estimation using the huber funtion with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)
Wedel, M., DeSarbo, W.: A mixture likelihood approach for generalized linear models. J. Classif. 12, 21–55 (1995)
White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838 (1980)
Acknowledgments
The work of Salvati and Ranalli has been developed under the support of the project PRIN-SURWEY http://www.sp.unipg.it/surwey/ (Grant 2012F42NS8, Italy). Marco Alfò acknowledges the financial support from the grant RBFR12SHVV of the Italian Government (FIRB project “Mixture and latent variable models for causal inference and analysis of socio-economic data”).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alfò, M., Salvati, N. & Giovanna, R.M. Finite mixtures of quantile and M-quantile regression models. Stat Comput 27, 547–570 (2017). https://doi.org/10.1007/s11222-016-9638-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9638-1