Abstract
In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The proposed algorithm and methods are implemented in the new R package \(\texttt {CensMixReg}\).
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102
Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21:361–373
Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473
Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388
Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–18
Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137
Chib S (1992) Bayes inference in the Tobit censored regression model. J Econ 51:79–99
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci, pp 54–75
Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150
Garay AM, Bolfarine H, Lachos VH, Cabral CRB (2015) Bayesian analysis of censored linear regression models with scale mixtures of normal distributions. J Appl Stat 42:2694–2714
Garay AM, Lachos VH, Bolfarine H, Cabral CR (2017) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278
Hastie T, Tibshirani R, Friedman J (2013) The elements of statistical learning. Springer, New York
Karlis D, Santourian A (2008) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19:73–83
Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642
Kim H-J (2008) Moments of truncated student-t distribution. J Korean Stat Soc 37:81–87
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322
Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278
Maehara RP, Sanchez LB (2016)BSSN: Birnbaum-saunders model based on skew-normal distribution. R package version 0.7
Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for Student-t censored linear regression models. Statistics 49:1074–1094
Matos LA, Prates MO, Chen M-H, Lachos VH (2013) Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution. Stat Sin 23:1323–1345
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New Jersey
Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 81:633–648
Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica, pp 765–799
Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325
Powell JL (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460
R Core Team (2018) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Sanchez LB, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. R package version 3.0
Santana L, Vilca F, Leiva V (2011) Influence analysis in skew-Birnbaum Saunders regression models and applications. J Appl Stat 38:1633–1649
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704
Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton
Acknowledgements
We are grateful to four anonymous referees, the editor and the associate editor for very useful comments and suggestions, which greatly improved this paper. This paper was written while Celso R. B. Cabral was a visiting professor in the Department of Statistics at the University of Campinas, Brazil. Celso R. B. Cabral was supported by CNPq (Grants 167731/2013-0 and 447964/2014-3), and FAPESP-Brazil (Grant 2015/20922-5). V.H. Lachos acknowledges support from FAPESP-Brazil (Grant 2018/05013-7). M.O. Prates was supported by CNPq-Brazil (Grant PQ-305401/2017-7) and FAPEMIG-Brazil (Grant PPM-00532-16). We also thank Luis B. Sanchez from University of São Paulo for his help on an earlier version of the article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lachos, V.H., Cabral, C.R.B., Prates, M.O. et al. Flexible regression modeling for censored data based on mixtures of student-t distributions. Comput Stat 34, 123–152 (2019). https://doi.org/10.1007/s00180-018-0856-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-018-0856-1