Skip to main content
Log in

Flexible regression modeling for censored data based on mixtures of student-t distributions

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The proposed algorithm and methods are implemented in the new R package \(\texttt {CensMixReg}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B 36:99–102

    MathSciNet  MATH  Google Scholar 

  • Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21:361–373

    Article  MathSciNet  MATH  Google Scholar 

  • Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473

    Article  MathSciNet  MATH  Google Scholar 

  • Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388

    Article  MathSciNet  MATH  Google Scholar 

  • Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12:1–18

    MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941

    Article  MathSciNet  MATH  Google Scholar 

  • Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    Article  MathSciNet  MATH  Google Scholar 

  • Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normal distributions. Stat Methods Appl 21:121–137

    Article  MathSciNet  Google Scholar 

  • Chib S (1992) Bayes inference in the Tobit censored regression model. J Econ 51:79–99

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci, pp 54–75

  • Galimberti G, Soffritti G (2014) A multivariate linear regression analysis using finite mixtures of t distributions. Comput Stat Data Anal 71:138–150

    Article  MathSciNet  MATH  Google Scholar 

  • Garay AM, Bolfarine H, Lachos VH, Cabral CRB (2015) Bayesian analysis of censored linear regression models with scale mixtures of normal distributions. J Appl Stat 42:2694–2714

    Article  MathSciNet  Google Scholar 

  • Garay AM, Lachos VH, Bolfarine H, Cabral CR (2017) Linear censored regression models with scale mixtures of normal distributions. Stat Pap 58:247–278

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2013) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  • Karlis D, Santourian A (2008) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19:73–83

    Article  MathSciNet  Google Scholar 

  • Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Stat Pap 55:627–642

    Article  MathSciNet  MATH  Google Scholar 

  • Kim H-J (2008) Moments of truncated student-t distribution. J Korean Stat Soc 37:81–87

    Article  MathSciNet  MATH  Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322

    MathSciNet  MATH  Google Scholar 

  • Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896

    MathSciNet  Google Scholar 

  • Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278

    MathSciNet  MATH  Google Scholar 

  • Maehara RP, Sanchez LB (2016)BSSN: Birnbaum-saunders model based on skew-normal distribution. R package version 0.7

  • Massuia MB, Cabral CRB, Matos LA, Lachos VH (2015) Influence diagnostics for Student-t censored linear regression models. Statistics 49:1074–1094

    Article  MathSciNet  MATH  Google Scholar 

  • Matos LA, Prates MO, Chen M-H, Lachos VH (2013) Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution. Stat Sin 23:1323–1345

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New Jersey

    Book  MATH  Google Scholar 

  • Meng X, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 81:633–648

    MathSciNet  MATH  Google Scholar 

  • Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica, pp 765–799

  • Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325

    Article  MathSciNet  MATH  Google Scholar 

  • Powell JL (1986) Symmetrically trimmed least squares estimation for Tobit models. Econometrica 54:1435–1460

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2018) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria

  • Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Sanchez LB, Lachos VH, Moreno EJL (2017) CensMixReg: censored linear mixture regression models. R package version 3.0

  • Santana L, Vilca F, Leiva V (2011) Influence analysis in skew-Birnbaum Saunders regression models and applications. J Appl Stat 38:1633–1649

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704

    Article  Google Scholar 

  • Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to four anonymous referees, the editor and the associate editor for very useful comments and suggestions, which greatly improved this paper. This paper was written while Celso R. B. Cabral was a visiting professor in the Department of Statistics at the University of Campinas, Brazil. Celso R. B. Cabral was supported by CNPq (Grants 167731/2013-0 and 447964/2014-3), and FAPESP-Brazil (Grant 2015/20922-5). V.H. Lachos acknowledges support from FAPESP-Brazil (Grant 2018/05013-7). M.O. Prates was supported by CNPq-Brazil (Grant PQ-305401/2017-7) and FAPEMIG-Brazil (Grant PPM-00532-16). We also thank Luis B. Sanchez from University of São Paulo for his help on an earlier version of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celso R. B. Cabral.

Appendix: Simulation Study 2: bias and RMSE with data generated by the FM-NCR model

Appendix: Simulation Study 2: bias and RMSE with data generated by the FM-NCR model

See Tables 7 and 8.

Table 7 Simulation Study 2: Bias of estimates with data generated by the FM-NCR model
Table 8 Simulation Study 2: Root mean squared errors (RMSE) of estimates with data generated by the FM-NCR model

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lachos, V.H., Cabral, C.R.B., Prates, M.O. et al. Flexible regression modeling for censored data based on mixtures of student-t distributions. Comput Stat 34, 123–152 (2019). https://doi.org/10.1007/s00180-018-0856-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-018-0856-1

Keywords

Navigation