Skip to main content
Log in

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In statistical analysis, particularly in econometrics, the finite mixture of regression models based on the normality assumption is routinely used to analyze censored data. In this work, an extension of this model is proposed by considering scale mixtures of normal distributions (SMN). This approach allows us to model data with great flexibility, accommodating multimodality and heavy tails at the same time. The main virtue of considering the finite mixture of regression models for censored data under the SMN class is that this class of models has a nice hierarchical representation which allows easy implementation of inferences. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters in the proposed model. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset. The proposed algorithm and methods are implemented in the new R package CensMixReg.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102

    MathSciNet  MATH  Google Scholar 

  • Arellano-Valle RB, Castro L, González-Farías G, Muños Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473

    Article  MathSciNet  MATH  Google Scholar 

  • Ateya SF (2014) Maximum likelihood estimation under a finite mixture of generalized exponential distributions based on censored data. Stat Pap 55:311–325

    Article  MathSciNet  MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941

    Article  MathSciNet  MATH  Google Scholar 

  • Benites L, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. https://CRAN.R-project.org/package=CensMixReg, R package version 3.0

  • Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    Article  MathSciNet  MATH  Google Scholar 

  • Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Annal Stat 25:553–576

    Article  MathSciNet  MATH  Google Scholar 

  • Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions: literature review and a simulation study. Stat Pap 55:871–911

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fagundes RA, de Souza RM, Cysneiros FJA (2013) Robust regression with application to symbolic interval data. Eng Appl Artif Intell 26:564–573

    Article  Google Scholar 

  • Faria S, Soromenho G (2010) Fitting mixtures of linear regressions. J Stat Comput Simul 80(2):201–225

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150

    Article  MathSciNet  MATH  Google Scholar 

  • Garay AM, Lachos VH, Bolfarine H, Cabral CRB (2015) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278

    Article  MathSciNet  MATH  Google Scholar 

  • Garay AM, Lachos VH, Lin TI (2016) Nonlinear censored regression models with heavy-tailed distributions. Stat Interface 9:281–293

    Article  MathSciNet  MATH  Google Scholar 

  • Greene WH (2012) Econometric analysis, 7th edn. Pearson, Harlow

    Google Scholar 

  • Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, pp 205–230

  • He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Adv Water Res 59:15–24

    Article  Google Scholar 

  • Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17:273–296

    Article  MATH  Google Scholar 

  • Hennig C (2012) Trimcluster: cluster analysis with trimming. https://CRAN.R-project.org/package=trimcluster, r package version 0.1–2

  • Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642

    Article  MathSciNet  MATH  Google Scholar 

  • Kaufman L, Rousseeuw P (1990) Finding groups in data. Wiley, New York

    Book  MATH  Google Scholar 

  • Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using the multivariate student-t distribution. J Multivar Anal 159:151–167

    Article  MathSciNet  MATH  Google Scholar 

  • Lange KL, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 2:175–198

    MathSciNet  Google Scholar 

  • Lin TI, Ho HJ, Lee CR (2014) Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat Comput 24:531–546

    Article  MathSciNet  MATH  Google Scholar 

  • Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648

    Article  MathSciNet  MATH  Google Scholar 

  • Louis T (1982) Finding the observed information matrix when using the em algorithm. J R Stat Soc Ser B 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for student-t censored linear regression models. Statistics 49:1074–1094

    Article  MathSciNet  MATH  Google Scholar 

  • MATLAB (2016) version 9.0 (R2016a). The MathWorks Inc., Natick, Massachusetts

  • Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y

    Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. John Wiley & Sons, New Jersey

    Book  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Melenberg B, Soest AV (1996) Parametric and semi-parametric modeling of vacation expenditures. J Appl Econ 11:59–76

    Article  Google Scholar 

  • Miyata Y (2011) Maximum likelihood estimators in finite mixture models with censored data. J Stat Plan Inference 141:56–64

    Article  MathSciNet  MATH  Google Scholar 

  • Mouselimis L (2017) ClusterR: gaussian mixture models, K-Means, mini-batch-Kmeans and K-Medoids clustering. https://CRAN.R-project.org/package=ClusterR, R package version 1.0.5

  • Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799

    Article  Google Scholar 

  • Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325

    Article  MathSciNet  MATH  Google Scholar 

  • Powell JF (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163

    Article  Google Scholar 

  • Tzortzis G, Likas A (2014) The MinMax k-Means clustering algorithm. Pattern Recognit 47:2505–2516

    Article  Google Scholar 

  • Vaida F, Liu L (2009) Fast implementation for normal mixed effects models with censored response. J Comput Graph Stat 18:797–817

    Article  MathSciNet  Google Scholar 

  • Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econom J Econom Soc 57:307–333

    MATH  Google Scholar 

  • Witte A (1980) Estimating an economic model of crime with individual data. Q J Econ 94:57–84

    Article  Google Scholar 

  • Zhang B (2003) Regression clustering. In: Proceedings of the third IEEE international conference on data mining, Melbourne

  • Zeller CB, Cabral CRB, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25:375–396

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celso Rômulo Barbosa Cabral.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeller, C.B., Cabral, C.R.B., Lachos, V.H. et al. Finite mixture of regression models for censored data based on scale mixtures of normal distributions. Adv Data Anal Classif 13, 89–116 (2019). https://doi.org/10.1007/s11634-018-0337-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0337-y

Keywords

Mathematics Subject Classification

Navigation