Skip to main content

Constrained monotone EM algorithms for mixtures of multivariate t distributions

Abstract

Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails, and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals.

This is a preview of subscription content, access via your institution.

References

  1. Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    MATH  Article  MathSciNet  Google Scholar 

  2. Biernacki, C.: (2004). An asymptotic upper bound of the likelihood to prevent Gaussian mixture from degenerating. Technical report, Université de Franche-Comté

  3. Campbell, N.A., Mahon, R.J.: A multivariate study of variation in two species of rock crab of genus. Letpograspus, Aust. J. Zool. 22, 417–455 (1974)

    Article  Google Scholar 

  4. Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)

    MATH  Article  MathSciNet  Google Scholar 

  5. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  6. Fang, K.T., Anderson, T.W.: Statistical Inference in Elliptically Contoured and Related Distributions. Alberton, New York (1990)

    MATH  Google Scholar 

  7. Frayley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)

    Article  Google Scholar 

  8. Greselin, F., Ingrassia, S.: A note on constrained EM algorithms for mixtures of elliptical distributions. In: Advances in Data Analysis, Data Handling and Business Intelligence, Proceedings of 32nd Annual Conference of German Classification Society, 53 (2008)

  9. Guerrero-Cusumano, J.L.: A measure of total variability for the multivariate t distribution with applications to finance. Inf. Sci. 92, 47–63 (1996)

    MATH  Article  MathSciNet  Google Scholar 

  10. Hathaway, R.J.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Stat. 13, 795–800 (1985)

    MATH  Article  MathSciNet  Google Scholar 

  11. Hawkins, D.M.: A new test for multivariate normality and homoscedasticity. Technometrics 23, 105–110 (1981)

    MATH  Article  MathSciNet  Google Scholar 

  12. Hennig, C.: Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann. Stat. 32, 1313–1340 (2004)

    MATH  Article  MathSciNet  Google Scholar 

  13. Ingrassia, S.: A likelihood-based constrained algorithm for multivariate normal mixture models. Stat. Methods Appl. 13, 151–166 (2004)

    Article  MathSciNet  Google Scholar 

  14. Ingrassia, S., Rocci, R.: Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput. Stat. Data Anal. 51, 5339–5351 (2007)

    MATH  Article  MathSciNet  Google Scholar 

  15. Kotz, S., Nadarajah, S.: Multivariate t Distributions and Their Applications. Cambridge University Press, New York (2004)

    MATH  Google Scholar 

  16. Lange, K.L., Little, R.J.A., Taylor, G.M.G.: Robust statistical modeling using the t distribution. J. Am. Stat. Assoc. 84, 881–896 (1989)

    Article  MathSciNet  Google Scholar 

  17. Lin, T.I., Lee, J.C., Ni, H.F.: Bayesian analysis of mixture modelling using the multivariate t distribution. Stat. Comput. 14, 119–130 (2004)

    Article  MathSciNet  Google Scholar 

  18. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    MATH  Book  Google Scholar 

  19. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)

    MATH  Article  MathSciNet  Google Scholar 

  20. Nadarajah, S., Kotz, S.: Mathematical properties of the multivariate t distribution. Acta Appl. Math. 89, 53–84 (2005)

    MATH  Article  MathSciNet  Google Scholar 

  21. Nettleton, D.: Convergence properties of the EM algorithm in constrained parameter spaces. Can. J. Stat. 27, 639–648 (1999)

    MATH  Article  MathSciNet  Google Scholar 

  22. Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)

    Article  Google Scholar 

  23. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984)

    MATH  Article  MathSciNet  Google Scholar 

  24. Shoham, S.: Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recognit. 35, 1127–1142 (2002)

    MATH  Article  Google Scholar 

  25. Theobald, C.M.: An inequality with applications to multivariate analysis. Biometrika 62, 461–466 (1975)

    MATH  Article  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to F. Greselin.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Greselin, F., Ingrassia, S. Constrained monotone EM algorithms for mixtures of multivariate t distributions. Stat Comput 20, 9–22 (2010). https://doi.org/10.1007/s11222-008-9112-9

Download citation

Keywords

  • Finite mixture models
  • EM algorithm
  • t Distribution
  • Clustering