Statistics and Computing

, Volume 27, Issue 2, pp 547–570 | Cite as

Finite mixtures of quantile and M-quantile regression models

  • Marco Alfò
  • Nicola SalvatiEmail author
  • M. Giovanna Ranallli


In this paper we define a finite mixture of quantile and M-quantile regression models for heterogeneous and /or for dependent/clustered data. Components of the finite mixture represent clusters of individuals with homogeneous values of model parameters. For its flexibility and ease of estimation, the proposed approaches can be extended to random coefficients with a higher dimension than the simple random intercept case. Estimation of model parameters is obtained through maximum likelihood, by implementing an EM-type algorithm. The standard error estimates for model parameters are obtained using the inverse of the observed information matrix, derived through the Oakes (J R Stat Soc Ser B 61:479–482, 1999) formula in the M-quantile setting, and through nonparametric bootstrap in the quantile case. We present a large scale simulation study to analyse the practical behaviour of the proposed model and to evaluate the empirical performance of the proposed standard error estimates for model parameters. We considered a variety of empirical settings in both the random intercept and the random coefficient case. The proposed modelling approaches are also applied to two well-known datasets which give further insights on their empirical behaviour.


Random effect models Nuisance parameters Nonparametric MLe Robust models Quantile regression M-quantile regression Longitudinal data 



The work of Salvati and Ranalli has been developed under the support of the project PRIN-SURWEY (Grant 2012F42NS8, Italy). Marco Alfò acknowledges the financial support from the grant RBFR12SHVV of the Italian Government (FIRB project “Mixture and latent variable models for causal inference and analysis of socio-economic data”).


  1. Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. National Bureau of standards, Washington, DC (1964)zbMATHGoogle Scholar
  2. Aitkin, M.: A general maximum likelihood analysis of overdispersion in generalized linear models. Stat. Comput. 6, 127–130 (1996)CrossRefGoogle Scholar
  3. Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117–128 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Aitkin, M., Francis, B., Hinde, J.: Statistical Modelling in GLIM, 2nd edn. Oxford University Press, Oxford (2005)zbMATHGoogle Scholar
  5. Alfó, M., Trovato, G.: Semiparametric mixture models for multivariate count data, with application. Econom. J. 7, 426–454 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bianchi, A., Fabrizi, E., Salvati, N., Tzavidis, N.: M-quantile regression: diagnostics and parametric representation of the model. Working paper. (2015)
  7. Bianchi, A., Salvati, N.: Asymptotic properties and variance estimators of the M-quantile regression coefficients estimators. Commun. Stat. 44, 2416–2429 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  8. Breckling, J., Chambers, R.: \({M}\)-quantiles. Biometrika 75, 761–771 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  9. Cantoni, E., Ronchetti, E.: Robust inference for generalized linear models. J. Am. Stat. Assoc. 96, 1022–1030 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Davis, C.: Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. Stat. Med. 10, 1959–1980 (1991)CrossRefGoogle Scholar
  11. DeSarbo, W., Cron, W.: A maximum likelihood methodology for clusterwise regression. J. Classif. 5, 249–282 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  12. Farcomeni, A.: Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 22, 141–152 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  13. Follmann, D., Lambert, D.: Generalizing logistic regression by nonparametric mixing. J. Am. Stat. Assoc. 84, 295–300 (1989)CrossRefGoogle Scholar
  14. Friedl, H., Kauermann, G.: Standard errors for EM estimates in generalized linear models with random effects. Biometrics 56, 761–767 (2000)CrossRefzbMATHGoogle Scholar
  15. Geraci, M., Bottai, M.: Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154 (2007)CrossRefzbMATHGoogle Scholar
  16. Geraci, M., Bottai, M.: Linear quantile mixed models. Stat. Comput. 24, 461–479 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  17. Geyer, C., Thompson, E.: Constrained Monte Carlo maximum likelihood for dependent data. J. R. Stat. Soc. B 54, 657–699 (1992)MathSciNetGoogle Scholar
  18. Gueorguieva, R.: A multivariate generalized linear mixed model for joint modelling of clustered outcomes in the exponential family. Stat. Model. 1, 177–193 (2001)CrossRefzbMATHGoogle Scholar
  19. Hennig, C.: Identifiability of models for clusterwise linear regression. J. Classif. 17, 273–296 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Huber, P.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  21. Huber, P.: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat. 1, 799–821 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Huber, P. J.: The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 221–233. Wiley, Amsterdam (1967)Google Scholar
  23. Huber, P.J.: Robust Statistics. Wiley, Hoboken (1981)CrossRefzbMATHGoogle Scholar
  24. Jank, W., Booth, J.: Efficiency of Monte Carlo EM and simulated maximum likelihood in two-stage hierarchical models. J. Comput. Graph. Stat. 12, 214–229 (2003)Google Scholar
  25. Jones, M.C.: Expectiles and m-quantiles are quantiles. Stat. Probab. Lett. 20, 149–153 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Jung, S.: Quasi-likelihood for median regression models. J. Am. Stat. Assoc. 91, 251–257 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  27. Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  28. Koenker, R., D’Orey, V.: Computing regression quantiles. Biometrika 93, 255–268 (1987)Google Scholar
  29. Kokic, P., Chambers, R., Breckling, J., Beare, S.: A measure of production performance. J. Bus. Econ. Stat. 10, 419–435 (1997)Google Scholar
  30. Laird, N.M.: Nonparametric maximum likelihood estimation of a mixing distribution. J. Am. Stat. Assoc. 73, 805–811 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  31. Liu, Q., Pierce, D.: A note on Gaussian–Hermite quadrature. Biometrika 81, 624–629 (1994)MathSciNetzbMATHGoogle Scholar
  32. Liu, Y., Bottai, M.: Mixed-effects models for conditional quantiles with longitudinal data. Int. J. Biostat. 5, 1–22 (2009)MathSciNetGoogle Scholar
  33. Louis, T.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–233 (1982)MathSciNetzbMATHGoogle Scholar
  34. McCulloch, C.: Maximum likelihood estimation of variance components for binary data. J. Am. Stat. Assoc. 89, 330–335 (1994)CrossRefzbMATHGoogle Scholar
  35. Munkin, M.K., Trivedi, P.K.: Simulated maximum likelihood estimation of multivariate mixed-Poisson regression models, with application. Econom. J. 2, 29–48 (1999)CrossRefzbMATHGoogle Scholar
  36. Newey, W., Powell, J.: Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  37. Oakes, D.: Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B 61, 479–482 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  38. Pinheiro, J., Bates, D.: Approximations to the log-likelihood function in the nonlinear mixed-effects model. J. Comput. Graph. Stat. 4, 12–35 (1995)Google Scholar
  39. Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (2007)zbMATHGoogle Scholar
  40. Street, J., Carroll, R., Ruppert, D.: A note on computing robust regression estimates via iteratively reweighed least squares. Am. Stat. 42, 152–154 (1988)Google Scholar
  41. Treatment of Lead-Exposed Children (TLC) Trial Group: Safety and efficacy of succimer in toddlers with blood lead levels of 20–44 \(\mu {\rm g/dl}\). Pediatr. Res. 48, 593–599 (2000)Google Scholar
  42. Tzavidis, N., Salvati, N., Schmid, T., Flouri, E., Midouhas, E.: Longitudinal analysis of the Strengths and Difficulties Questionnaire scores of the Millennium Cohort Study children in England using M-quantile random effects regression. J. R. Stat. Soc. A. 179, 427–452 (2016)Google Scholar
  43. Wang, P., Puterman, M., Cockburn, I., Le, N.: Mixed Poisson regression models with covariate dependent rates. Biometrics 52, 381–400 (1996)CrossRefzbMATHGoogle Scholar
  44. Wang, Y., Lin, X., Zhu, M., Bai, Z.: Robust estimation using the huber funtion with a data-dependent tuning constant. J. Comput. Graph. Stat. 16(2), 468–481 (2007)CrossRefGoogle Scholar
  45. Wedel, M., DeSarbo, W.: A mixture likelihood approach for generalized linear models. J. Classif. 12, 21–55 (1995)CrossRefzbMATHGoogle Scholar
  46. White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817–838 (1980)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Marco Alfò
    • 1
  • Nicola Salvati
    • 2
    Email author
  • M. Giovanna Ranallli
    • 3
  1. 1.Dipartimento di Scienze StatisticheSapienza Università di RomaRomeItaly
  2. 2.Dipartimento di Economia e ManagementUniversità di PisaPisaItaly
  3. 3.Dipartimento di Scienze PoliticheUniversità degli Studi di PerugiaPerugiaItaly

Personalised recommendations