Advertisement

Computational Statistics

, Volume 34, Issue 1, pp 123–152 | Cite as

Flexible regression modeling for censored data based on mixtures of student-t distributions

  • Víctor H. Lachos
  • Celso R. B. CabralEmail author
  • Marcos O. Prates
  • Dipak K. Dey
Original Paper
  • 80 Downloads

Abstract

In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The proposed algorithm and methods are implemented in the new R package \(\texttt {CensMixReg}\).

Keywords

Censored regression model EM-type algorithms Finite mixture models Heavy-tails Tobit model 

Notes

Acknowledgements

We are grateful to four anonymous referees, the editor and the associate editor for very useful comments and suggestions, which greatly improved this paper. This paper was written while Celso R. B. Cabral was a visiting professor in the Department of Statistics at the University of Campinas, Brazil. Celso R. B. Cabral was supported by CNPq (Grants 167731/2013-0 and 447964/2014-3), and FAPESP-Brazil (Grant 2015/20922-5). V.H. Lachos acknowledges support from FAPESP-Brazil (Grant 2018/05013-7). M.O. Prates was supported by CNPq-Brazil (Grant PQ-305401/2017-7) and FAPEMIG-Brazil (Grant PPM-00532-16). We also thank Luis B. Sanchez from University of São Paulo for his help on an earlier version of the article.

References

  1. Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723MathSciNetCrossRefzbMATHGoogle Scholar
  2. Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102MathSciNetzbMATHGoogle Scholar
  3. Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21:361–373MathSciNetCrossRefzbMATHGoogle Scholar
  4. Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473MathSciNetCrossRefzbMATHGoogle Scholar
  5. Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388MathSciNetCrossRefzbMATHGoogle Scholar
  6. Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–18zbMATHGoogle Scholar
  7. Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941MathSciNetCrossRefzbMATHGoogle Scholar
  8. Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142MathSciNetCrossRefzbMATHGoogle Scholar
  9. Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137MathSciNetCrossRefGoogle Scholar
  10. Chib S (1992) Bayes inference in the Tobit censored regression model. J Econ 51:79–99MathSciNetCrossRefzbMATHGoogle Scholar
  11. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38MathSciNetzbMATHGoogle Scholar
  12. Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci, pp 54–75Google Scholar
  13. Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150MathSciNetCrossRefzbMATHGoogle Scholar
  14. Garay AM, Bolfarine H, Lachos VH, Cabral CRB (2015) Bayesian analysis of censored linear regression models with scale mixtures of normal distributions. J Appl Stat 42:2694–2714MathSciNetCrossRefGoogle Scholar
  15. Garay AM, Lachos VH, Bolfarine H, Cabral CR (2017) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2013) The elements of statistical learning. Springer, New YorkzbMATHGoogle Scholar
  17. Karlis D, Santourian A (2008) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19:73–83MathSciNetCrossRefGoogle Scholar
  18. Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642MathSciNetCrossRefzbMATHGoogle Scholar
  19. Kim H-J (2008) Moments of truncated student-t distribution. J Korean Stat Soc 37:81–87MathSciNetCrossRefzbMATHGoogle Scholar
  20. Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322MathSciNetzbMATHGoogle Scholar
  21. Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896MathSciNetGoogle Scholar
  22. Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278MathSciNetzbMATHGoogle Scholar
  23. Maehara RP, Sanchez LB (2016)BSSN: Birnbaum-saunders model based on skew-normal distribution. R package version 0.7Google Scholar
  24. Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for Student-t censored linear regression models. Statistics 49:1074–1094MathSciNetCrossRefzbMATHGoogle Scholar
  25. Matos LA, Prates MO, Chen M-H, Lachos VH (2013) Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution. Stat Sin 23:1323–1345MathSciNetzbMATHGoogle Scholar
  26. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New JerseyCrossRefzbMATHGoogle Scholar
  27. Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 81:633–648MathSciNetzbMATHGoogle Scholar
  28. Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica, pp 765–799Google Scholar
  29. Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325MathSciNetCrossRefzbMATHGoogle Scholar
  30. Powell JL (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460MathSciNetCrossRefzbMATHGoogle Scholar
  31. R Core Team (2018) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, AustriaGoogle Scholar
  32. Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  33. Sanchez LB, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. R package version 3.0Google Scholar
  34. Santana L, Vilca F, Leiva V (2011) Influence analysis in skew-Birnbaum Saunders regression models and applications. J Appl Stat 38:1633–1649MathSciNetCrossRefzbMATHGoogle Scholar
  35. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefzbMATHGoogle Scholar
  36. Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704CrossRefGoogle Scholar
  37. Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of ConnecticutStorrsUSA
  2. 2.Departamento de EstatísticaUniversidade Federal do AmazonasManausBrazil
  3. 3.Departamento de EstatísticaUniversidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations