Statistics and Computing

, Volume 20, Issue 3, pp 343–356 | Cite as

Robust mixture modeling using multivariate skew t distributions

  • Tsung-I LinEmail author


This paper presents a robust mixture modeling framework using the multivariate skew t distributions, an extension of the multivariate Student’s t family with additional shape parameters to regulate skewness. The proposed model results in a very complicated likelihood. Two variants of Monte Carlo EM algorithms are developed to carry out maximum likelihood estimation of mixture parameters. In addition, we offer a general information-based method for obtaining the asymptotic covariance matrix of maximum likelihood estimates. Some practical issues including the selection of starting values as well as the stopping criterion are also discussed. The proposed methodology is applied to a subset of the Australian Institute of Sport data for illustration.


MCEM-type algorithms MSN MST Multivariate truncated normal Multivariate truncated t Outliers 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34, 663–682 (2007) CrossRefMathSciNetGoogle Scholar
  2. Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Statist. 32, 159–200 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  3. Azzalini, A., Capitaino, A.: Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. Ser. B 61, 579–602 (1999) zbMATHCrossRefGoogle Scholar
  4. Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65, 367–389 (2003) zbMATHCrossRefGoogle Scholar
  5. Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996) zbMATHCrossRefMathSciNetGoogle Scholar
  6. Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997) zbMATHGoogle Scholar
  7. Booth, G.J., Hobert, P.J.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. B 61, 265–285 (1999) zbMATHCrossRefGoogle Scholar
  8. Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994) zbMATHCrossRefGoogle Scholar
  9. Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16, 57–68 (2006) CrossRefMathSciNetGoogle Scholar
  10. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977) zbMATHMathSciNetGoogle Scholar
  11. Diebolt, J., Robert, C.P.: Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B 56, 363–375 (1994) zbMATHMathSciNetGoogle Scholar
  12. Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995) zbMATHCrossRefMathSciNetGoogle Scholar
  13. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998) zbMATHGoogle Scholar
  14. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002) zbMATHCrossRefMathSciNetGoogle Scholar
  15. Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006) zbMATHGoogle Scholar
  16. Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. 62, 49–66 (2000) zbMATHMathSciNetGoogle Scholar
  17. Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009) zbMATHCrossRefGoogle Scholar
  18. Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a) CrossRefMathSciNetGoogle Scholar
  19. Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b) zbMATHMathSciNetGoogle Scholar
  20. Lindsay, B.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics, Hayward (1995) zbMATHGoogle Scholar
  21. Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994) zbMATHCrossRefMathSciNetGoogle Scholar
  22. Lo, K., Brinkman, R.R., Gottardo, R.: Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A 73, 321–332 (2008) CrossRefGoogle Scholar
  23. Louis, T.A.: Finding the observed information when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–232 (1982) zbMATHMathSciNetGoogle Scholar
  24. McCulloch, C.E.: Maximum likelihood variance components estimation for binary data. J. Am. Stat. Assoc. 89, 330–335 (1994) zbMATHCrossRefGoogle Scholar
  25. McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Dekker, New York (1988) Google Scholar
  26. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008) zbMATHGoogle Scholar
  27. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) zbMATHCrossRefGoogle Scholar
  28. McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008) CrossRefMathSciNetGoogle Scholar
  29. Meilijson, I.: A fast improvement to the EM algorithm to its own terms. J. R. Stat. Soc. Ser. B 51, 127–138 (1989) zbMATHMathSciNetGoogle Scholar
  30. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993) zbMATHCrossRefMathSciNetGoogle Scholar
  31. Nadarajah, S., Kotz, S.: Programs in R for computing truncated t distributions. Qual. Reliab. Eng. Int. 23, 273–278 (2007) CrossRefGoogle Scholar
  32. Peel, D., McLachlan, G.J.: Robust Mixture modeling using the t distribution. Stat. Comput. 10, 339–348 (2000) CrossRefGoogle Scholar
  33. Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA (2009). doi: 10.1073/pnas.0903028106 Google Scholar
  34. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984) zbMATHCrossRefMathSciNetGoogle Scholar
  35. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008) Google Scholar
  36. Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59, 731–792 (1997) zbMATHCrossRefMathSciNetGoogle Scholar
  37. Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003) zbMATHCrossRefMathSciNetGoogle Scholar
  38. Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) zbMATHGoogle Scholar
  39. Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990) CrossRefGoogle Scholar
  40. Zhang, Z., Chan, K.L., Wu, Y., Cen, C.B.: Learning a multivariate Gaussian mixture model with the reversible Jump MCMC algorithm. Stat. Comput. 14, 343–355 (2004) CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Applied Mathematics and Institute of StatisticsNational Chung Hsing UniversityTaichungTaiwan

Personalised recommendations