Skip to main content
Log in

Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper deals with the problem of maximum likelihood estimation for a mixture of skew Student-t-normal distributions, which is a novel model-based tool for clustering heterogeneous (multiple groups) data in the presence of skewed and heavy-tailed outcomes. We present two analytically simple EM-type algorithms for iteratively computing the maximum likelihood estimates. The observed information matrix is derived for obtaining the asymptotic standard errors of parameter estimates. A small simulation study is conducted to demonstrate the superiority of the skew Student-t-normal distribution compared to the skew t distribution. The proposed methodology is particularly useful for analyzing multimodal asymmetric data as produced by major biotechnological platforms like flow cytometry. We provide such an application with the help of an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Stat. 32, 159–188 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)

    Article  MATH  Google Scholar 

  • Barndorff-Nielsen, O.E.: Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Stat. 24, 1–13 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997)

    MATH  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Singapore (2006)

    MATH  Google Scholar 

  • Cabral, C.R.B., Bolfarine, H., Pereira, J.R.G.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)

    Article  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)

    MATH  Google Scholar 

  • Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew normal and skew-t distributions. Biostatistics 11, 317–336 (2010)

    Article  Google Scholar 

  • Glynn, E.F.: FCSExtract Utility. Stowers Institute for Medical Research. Online available at: http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/ (2006)

  • Gómez, H.W., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)

    Article  MathSciNet  Google Scholar 

  • Hahne, F., LeMeur, N., Brinkman, R.R., Ellis, B., Haaland, P., Sarkar, D., Spidlen, J., Strain, E., Gentleman, R.: flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinform. 10, 106 (2009)

    Article  Google Scholar 

  • Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)

    Article  MathSciNet  Google Scholar 

  • Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā 62, 49–66 (2000)

    MathSciNet  MATH  Google Scholar 

  • Li, J.Q., Barron, A.R.: Mixture density estimation. In: Advances in Neural Information Processing Systems 12. MIT Press, Cambridge (2000)

    Google Scholar 

  • Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)

    Article  MATH  Google Scholar 

  • Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)

    MathSciNet  MATH  Google Scholar 

  • Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. B 44, 226–233 (1982)

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Dekker, New York (1988)

    Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)

    Article  MathSciNet  Google Scholar 

  • Meinicke, P., Brodag, T., Fricke, W.F., Waack, S.: P-value based visualization of codon usage data. Algorithms Mol. Biol. 1, 10 (2006)

    Article  Google Scholar 

  • Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Nadarajah, S., Kotz, S.: Skewed distributions generated by the normal kernel. Stat. Probab. Lett. 65, 269–277 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)

    Article  Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008)

    Google Scholar 

  • Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)

    MATH  Google Scholar 

  • Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Process. Lett. 15, 77–87 (2002)

    Article  MATH  Google Scholar 

  • Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of DICTA 2009, Conference of Digital Image Computing: Techniques and Applications, Melbourne, pp. 526–531. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung I. Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ho, H.J., Pyne, S. & Lin, T.I. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Stat Comput 22, 287–299 (2012). https://doi.org/10.1007/s11222-010-9225-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-010-9225-9

Keywords

Navigation