Advertisement

Statistics and Computing

, Volume 20, Issue 3, pp 343–356 | Cite as

Robust mixture modeling using multivariate skew t distributions

  • Tsung-I Lin
Article

Abstract

This paper presents a robust mixture modeling framework using the multivariate skew t distributions, an extension of the multivariate Student’s t family with additional shape parameters to regulate skewness. The proposed model results in a very complicated likelihood. Two variants of Monte Carlo EM algorithms are developed to carry out maximum likelihood estimation of mixture parameters. In addition, we offer a general information-based method for obtaining the asymptotic covariance matrix of maximum likelihood estimates. Some practical issues including the selection of starting values as well as the stopping criterion are also discussed. The proposed methodology is applied to a subset of the Australian Institute of Sport data for illustration.

Keywords

MCEM-type algorithms MSN MST Multivariate truncated normal Multivariate truncated t Outliers 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34, 663–682 (2007) CrossRefMathSciNetGoogle Scholar
  2. Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Statist. 32, 159–200 (2005) MATHCrossRefMathSciNetGoogle Scholar
  3. Azzalini, A., Capitaino, A.: Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. Ser. B 61, 579–602 (1999) MATHCrossRefGoogle Scholar
  4. Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65, 367–389 (2003) MATHCrossRefGoogle Scholar
  5. Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996) MATHCrossRefMathSciNetGoogle Scholar
  6. Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997) MATHGoogle Scholar
  7. Booth, G.J., Hobert, P.J.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. B 61, 265–285 (1999) MATHCrossRefGoogle Scholar
  8. Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994) MATHCrossRefGoogle Scholar
  9. Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16, 57–68 (2006) CrossRefMathSciNetGoogle Scholar
  10. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977) MATHMathSciNetGoogle Scholar
  11. Diebolt, J., Robert, C.P.: Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B 56, 363–375 (1994) MATHMathSciNetGoogle Scholar
  12. Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995) MATHCrossRefMathSciNetGoogle Scholar
  13. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998) MATHGoogle Scholar
  14. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002) MATHCrossRefMathSciNetGoogle Scholar
  15. Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006) MATHGoogle Scholar
  16. Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. 62, 49–66 (2000) MATHMathSciNetGoogle Scholar
  17. Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009) MATHCrossRefGoogle Scholar
  18. Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a) CrossRefMathSciNetGoogle Scholar
  19. Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b) MATHMathSciNetGoogle Scholar
  20. Lindsay, B.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics, Hayward (1995) MATHGoogle Scholar
  21. Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994) MATHCrossRefMathSciNetGoogle Scholar
  22. Lo, K., Brinkman, R.R., Gottardo, R.: Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A 73, 321–332 (2008) CrossRefGoogle Scholar
  23. Louis, T.A.: Finding the observed information when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–232 (1982) MATHMathSciNetGoogle Scholar
  24. McCulloch, C.E.: Maximum likelihood variance components estimation for binary data. J. Am. Stat. Assoc. 89, 330–335 (1994) MATHCrossRefGoogle Scholar
  25. McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Dekker, New York (1988) Google Scholar
  26. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008) MATHGoogle Scholar
  27. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) MATHCrossRefGoogle Scholar
  28. McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008) CrossRefMathSciNetGoogle Scholar
  29. Meilijson, I.: A fast improvement to the EM algorithm to its own terms. J. R. Stat. Soc. Ser. B 51, 127–138 (1989) MATHMathSciNetGoogle Scholar
  30. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993) MATHCrossRefMathSciNetGoogle Scholar
  31. Nadarajah, S., Kotz, S.: Programs in R for computing truncated t distributions. Qual. Reliab. Eng. Int. 23, 273–278 (2007) CrossRefGoogle Scholar
  32. Peel, D., McLachlan, G.J.: Robust Mixture modeling using the t distribution. Stat. Comput. 10, 339–348 (2000) CrossRefGoogle Scholar
  33. Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA (2009). doi: 10.1073/pnas.0903028106 Google Scholar
  34. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984) MATHCrossRefMathSciNetGoogle Scholar
  35. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008) Google Scholar
  36. Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59, 731–792 (1997) MATHCrossRefMathSciNetGoogle Scholar
  37. Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003) MATHCrossRefMathSciNetGoogle Scholar
  38. Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) MATHGoogle Scholar
  39. Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990) CrossRefGoogle Scholar
  40. Zhang, Z., Chan, K.L., Wu, Y., Cen, C.B.: Learning a multivariate Gaussian mixture model with the reversible Jump MCMC algorithm. Stat. Comput. 14, 343–355 (2004) CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Applied Mathematics and Institute of StatisticsNational Chung Hsing UniversityTaichungTaiwan

Personalised recommendations