Abstract
In statistical analysis, particularly in econometrics, the finite mixture of regression models based on the normality assumption is routinely used to analyze censored data. In this work, an extension of this model is proposed by considering scale mixtures of normal distributions (SMN). This approach allows us to model data with great flexibility, accommodating multimodality and heavy tails at the same time. The main virtue of considering the finite mixture of regression models for censored data under the SMN class is that this class of models has a nice hierarchical representation which allows easy implementation of inferences. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters in the proposed model. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset. The proposed algorithm and methods are implemented in the new R package CensMixReg.
Similar content being viewed by others
References
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102
Arellano-Valle RB, Castro L, González-Farías G, Muños Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473
Ateya SF (2014) Maximum likelihood estimation under a finite mixture of generalized exponential distributions based on censored data. Stat Pap 55:311–325
Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Benites L, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. https://CRAN.R-project.org/package=CensMixReg, R package version 3.0
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Annal Stat 25:553–576
Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions: literature review and a simulation study. Stat Pap 55:871–911
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Fagundes RA, de Souza RM, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26:564–573
Faria S, Soromenho G (2010) Fitting mixtures of linear regressions. J Stat Comput Simul 80(2):201–225
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150
Garay AM, Lachos VH, Bolfarine H, Cabral CRB (2015) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278
Garay AM, Lachos VH, Lin TI (2016) Nonlinear censored regression models with heavy-tailed distributions. Stat Interface 9:281–293
Greene WH (2012) Econometric analysis, 7th edn. Pearson, Harlow
Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, pp 205–230
He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Res 59:15–24
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17:273–296
Hennig C (2012) Trimcluster: cluster analysis with trimming. https://CRAN.R-project.org/package=trimcluster, r package version 0.1–2
Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642
Kaufman L, Rousseeuw P (1990) Finding groups in data. Wiley, New York
Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate student-t distribution. J Multivar Anal 159:151–167
Lange KL, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 2:175–198
Lin TI, Ho HJ, Lee CR (2014) Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24:531–546
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648
Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44:226–233
Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for student-t censored linear regression models. Statistics 49:1074–1094
MATLAB (2016) version 9.0 (R2016a). The MathWorks Inc., Natick, Massachusetts
Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. John Wiley & Sons, New Jersey
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Melenberg B, Soest AV (1996) Parametric and semi-parametric modeling of vacation expenditures. J Appl Econ 11:59–76
Miyata Y (2011) Maximum likelihood estimators in finite mixture models with censored data. J Stat Plan Inference 141:56–64
Mouselimis L (2017) ClusterR: gaussian mixture models, K-Means, mini-batch-Kmeans and K-Medoids clustering. https://CRAN.R-project.org/package=ClusterR, R package version 1.0.5
Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799
Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325
Powell JF (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163
Tzortzis G, Likas A (2014) The MinMax k-Means clustering algorithm. Pattern Recognit 47:2505–2516
Vaida F, Liu L (2009) Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat 18:797–817
Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 57:307–333
Witte A (1980) Estimating an economic model of crime with individual data. Q J Econ 94:57–84
Zhang B (2003) Regression clustering. In: Proceedings of the third IEEE international conference on data mining, Melbourne
Zeller CB, Cabral CRB, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25:375–396
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zeller, C.B., Cabral, C.R.B., Lachos, V.H. et al. Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Adv Data Anal Classif 13, 89–116 (2019). https://doi.org/10.1007/s11634-018-0337-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0337-y