Advertisement

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

  • Camila Borelli Zeller
  • Celso Rômulo Barbosa CabralEmail author
  • Víctor Hugo Lachos
  • Luis Benites
Regular Article
  • 227 Downloads

Abstract

In statistical analysis, particularly in econometrics, the finite mixture of regression models based on the normality assumption is routinely used to analyze censored data. In this work, an extension of this model is proposed by considering scale mixtures of normal distributions (SMN). This approach allows us to model data with great flexibility, accommodating multimodality and heavy tails at the same time. The main virtue of considering the finite mixture of regression models for censored data under the SMN class is that this class of models has a nice hierarchical representation which allows easy implementation of inferences. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters in the proposed model. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset. The proposed algorithm and methods are implemented in the new R package CensMixReg.

Keywords

Censoring EM-type algorithm Finite mixture of regression models Scale mixtures of normal distributions 

Mathematics Subject Classification

62H30 62J05 62N01 

References

  1. Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102MathSciNetzbMATHGoogle Scholar
  2. Arellano-Valle RB, Castro L, González-Farías G, Muños Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473MathSciNetCrossRefzbMATHGoogle Scholar
  3. Ateya SF (2014) Maximum likelihood estimation under a finite mixture of generalized exponential distributions based on censored data. Stat Pap 55:311–325MathSciNetCrossRefzbMATHGoogle Scholar
  4. Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941MathSciNetCrossRefzbMATHGoogle Scholar
  5. Benites L, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. https://CRAN.R-project.org/package=CensMixReg, R package version 3.0
  6. Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetCrossRefzbMATHGoogle Scholar
  7. Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137MathSciNetCrossRefGoogle Scholar
  8. Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Annal Stat 25:553–576MathSciNetCrossRefzbMATHGoogle Scholar
  9. Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions: literature review and a simulation study. Stat Pap 55:871–911MathSciNetCrossRefzbMATHGoogle Scholar
  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38MathSciNetzbMATHGoogle Scholar
  11. Fagundes RA, de Souza RM, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26:564–573CrossRefGoogle Scholar
  12. Faria S, Soromenho G (2010) Fitting mixtures of linear regressions. J Stat Comput Simul 80(2):201–225MathSciNetCrossRefzbMATHGoogle Scholar
  13. Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New YorkzbMATHGoogle Scholar
  14. Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150MathSciNetCrossRefzbMATHGoogle Scholar
  15. Garay AM, Lachos VH, Bolfarine H, Cabral CRB (2015) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278MathSciNetCrossRefzbMATHGoogle Scholar
  16. Garay AM, Lachos VH, Lin TI (2016) Nonlinear censored regression models with heavy-tailed distributions. Stat Interface 9:281–293MathSciNetCrossRefzbMATHGoogle Scholar
  17. Greene WH (2012) Econometric analysis, 7th edn. Pearson, HarlowGoogle Scholar
  18. Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, pp 205–230Google Scholar
  19. He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Res 59:15–24CrossRefGoogle Scholar
  20. Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17:273–296CrossRefzbMATHGoogle Scholar
  21. Hennig C (2012) Trimcluster: cluster analysis with trimming. https://CRAN.R-project.org/package=trimcluster, r package version 0.1–2
  22. Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642MathSciNetCrossRefzbMATHGoogle Scholar
  23. Kaufman L, Rousseeuw P (1990) Finding groups in data. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  24. Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate student-t distribution. J Multivar Anal 159:151–167MathSciNetCrossRefzbMATHGoogle Scholar
  25. Lange KL, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 2:175–198MathSciNetGoogle Scholar
  26. Lin TI, Ho HJ, Lee CR (2014) Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24:531–546MathSciNetCrossRefzbMATHGoogle Scholar
  27. Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648MathSciNetCrossRefzbMATHGoogle Scholar
  28. Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44:226–233MathSciNetzbMATHGoogle Scholar
  29. Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for student-t censored linear regression models. Statistics 49:1074–1094MathSciNetCrossRefzbMATHGoogle Scholar
  30. MATLAB (2016) version 9.0 (R2016a). The MathWorks Inc., Natick, MassachusettsGoogle Scholar
  31. Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap.  https://doi.org/10.1007/s00362-017-0964-y Google Scholar
  32. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. John Wiley & Sons, New JerseyCrossRefzbMATHGoogle Scholar
  33. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  34. Melenberg B, Soest AV (1996) Parametric and semi-parametric modeling of vacation expenditures. J Appl Econ 11:59–76CrossRefGoogle Scholar
  35. Miyata Y (2011) Maximum likelihood estimators in finite mixture models with censored data. J Stat Plan Inference 141:56–64MathSciNetCrossRefzbMATHGoogle Scholar
  36. Mouselimis L (2017) ClusterR: gaussian mixture models, K-Means, mini-batch-Kmeans and K-Medoids clustering. https://CRAN.R-project.org/package=ClusterR, R package version 1.0.5
  37. Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799CrossRefGoogle Scholar
  38. Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325MathSciNetCrossRefzbMATHGoogle Scholar
  39. Powell JF (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460MathSciNetCrossRefzbMATHGoogle Scholar
  40. R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  41. Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163CrossRefGoogle Scholar
  42. Tzortzis G, Likas A (2014) The MinMax k-Means clustering algorithm. Pattern Recognit 47:2505–2516CrossRefGoogle Scholar
  43. Vaida F, Liu L (2009) Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat 18:797–817MathSciNetCrossRefGoogle Scholar
  44. Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 57:307–333zbMATHGoogle Scholar
  45. Witte A (1980) Estimating an economic model of crime with individual data. Q J Econ 94:57–84CrossRefGoogle Scholar
  46. Zhang B (2003) Regression clustering. In: Proceedings of the third IEEE international conference on data mining, MelbourneGoogle Scholar
  47. Zeller CB, Cabral CRB, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25:375–396MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Camila Borelli Zeller
    • 1
  • Celso Rômulo Barbosa Cabral
    • 2
    Email author
  • Víctor Hugo Lachos
    • 3
  • Luis Benites
    • 4
  1. 1.Departamento de EstatísticaUniversidade Federal de Juiz de ForaJuiz de ForaBrazil
  2. 2.Departamento de EstatísticaUniversidade Federal do AmazonasManausBrazil
  3. 3.Department of StatisticsUniversity of ConnecticutStorrsUSA
  4. 4.Departamento de CienciasPontificia Universidad Católica del PerúLimaPeru

Personalised recommendations