Advertisement

Statistics and Computing

, Volume 24, Issue 6, pp 971–984 | Cite as

A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering

  • Florence Forbes
  • Darren Wraith
Article

Abstract

We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.

Keywords

Covariance matrix decomposition EM algorithm Gaussian scale mixture Multivariate generalized t-distribution Outlier detection 

Supplementary material

11222_2013_9414_MOESM1_ESM.pdf (2.2 mb)
Missing Appendices, Tables, and Figures are available in a companion supplemental file. (PDF 2.2 MB)

References

  1. Andrews, J.L., McNicholas, P.D.: Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat. Comput. 22(5), 1021–1029 (2012) CrossRefzbMATHMathSciNetGoogle Scholar
  2. Archambeau, C., Verleysen, M.: Robust Bayesian clustering. Neural Netw. 20(1), 129–138 (2007) CrossRefzbMATHGoogle Scholar
  3. Arnaud, E., Christensen, H., Lu, Y.-C., Barker, J., Khalidov, V., Hansard, M., Holveck, B., Mathieu, H., Narasimha, R., Taillant, E., Forbes, F., Horaud, R.: The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements. In: 10th International Conference on Multimodal Interfaces, ICMI 2008, pp. 109–116. Chania, Crete, Greece (2008). ACM Google Scholar
  4. Azzalini, A., Genton, M.G.: Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 76(1), 106–129 (2008) CrossRefzbMATHGoogle Scholar
  5. Barndorff-Nielsen, O., Kent, J., Sorensen, M.: Normal variance-mean mixtures and z distributions. Int. Stat. Rev. 50(2), 145–159 (1982) CrossRefzbMATHMathSciNetGoogle Scholar
  6. Bishop, C.M., Svensen, M.: Robust Bayesian mixture modelling. Neurocomputing 64, 235–252 (2005) CrossRefGoogle Scholar
  7. Bouveyron, C., Girard, S., Schmid, C.: High dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007) CrossRefzbMATHMathSciNetGoogle Scholar
  8. Browne, R., McNicholas, P.: Orthogonal Stiefel manifold optimization for eigen-decomposed covariance parameter estimation in mixture models. In: Statistics and Computing (2012). Published online doi: 10.1007/s11222-012-9364-2 Google Scholar
  9. Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28(5), 781–793 (1995) CrossRefGoogle Scholar
  10. Cuesta-Albertos, J.A., Gordaliza, A., Matran, C.: Trimmed k-means: an attempt to robustify quantizers. Ann. Stat. 25(2), 553–576 (1997) CrossRefzbMATHMathSciNetGoogle Scholar
  11. Cuesta-Albertos, J.A., Matrán, C., Mayo-Iscar, A.: Robust estimation in the normal mixture model based on robust clustering. J. R. Stat. Soc., Ser. B, Stat. Methodol. 70(4), 779–802 (2008) CrossRefzbMATHMathSciNetGoogle Scholar
  12. Daul, S., DeGiorgi, E., Lindskog, F., McNeil, A.J.: The grouped t-copula with an application to credit risk. Risk 16, 73–76 (2003) Google Scholar
  13. Demarta, S., McNeil, A.J.: The t copula and related copulas. Int. Stat. Rev. 73(1), 111–129 (2005) CrossRefzbMATHGoogle Scholar
  14. Eltoft, T., Kim, T., Lee, T.-W.: Multivariate scale mixture of Gaussians modeling. In: Rosca, J., Erdogmus, D., Principe, J., Haykin, S. (eds.) Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science, vol. 3889, pp. 799–806. Springer, Berlin/Heidelberg (2006) CrossRefGoogle Scholar
  15. Fang, H.-B., Fang, K.-T., Kotz, S.: The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82(1), 1–16 (2002) CrossRefzbMATHMathSciNetGoogle Scholar
  16. Finegold, M., Drton, M.: Robust graphical modeling of gene networks using classical and alternative t-distributions. Ann. Appl. Stat. 5(2A), 1057–1080 (2011) CrossRefzbMATHMathSciNetGoogle Scholar
  17. Flury, B.N.: Common principal components in K groups. J. Am. Stat. Assoc. 79(388), 892–898 (1984) MathSciNetGoogle Scholar
  18. Flury, B.N., Gautschi, W.: An algorithm for simultaneous orthogonal transformation of several positive definite symmetric matrices to nearly diagonal form. SIAM J. Sci. Stat. Comput. 7(1), 169–184 (1986) CrossRefzbMATHMathSciNetGoogle Scholar
  19. Forbes, F., Doyle, S., Garcia-Lorenzo, D., Barillot, C., Dojat, M.: A weighted multi-sequence Markov model for brain lesion segmentation. In: 13th International Conference on Artificial Intelligence and Statistics (AISTATS10), pp. 13–15. Sardinia, Italy (2010) Google Scholar
  20. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002) CrossRefzbMATHMathSciNetGoogle Scholar
  21. Giordani, R., Mun, X., Tran, M.-N., Kohn, R.: Flexible multivariate density estimation with marginal adaptation. J. Comput. Graph. Stat. (2012). Published on line doi: 10.1080/10618600.2012.672784 Google Scholar
  22. Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2, 2nd edn. Wiley, New York (1994) zbMATHGoogle Scholar
  23. Jones, M.C.: A dependent bivariate t distribution with marginals on different degrees of freedom. Stat. Probab. Lett. 56(2), 163–170 (2002) CrossRefzbMATHGoogle Scholar
  24. Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19(1), 73–83 (2009) CrossRefMathSciNetGoogle Scholar
  25. Khalidov, V.: Conjugate mixture models for the modelling of visual and auditory perception. PhD thesis, Grenoble University (October 2010) Google Scholar
  26. Khalidov, V., Forbes, F., Horaud, R.: Conjugate mixture models for clustering multimodal data. Neural Comput. 23(2), 517–557 (2011) CrossRefzbMATHMathSciNetGoogle Scholar
  27. Kotz, S., Nadarajah, S.: Multivariate t Distributions and their Applications. Cambridge (2004) Google Scholar
  28. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000a) CrossRefzbMATHGoogle Scholar
  29. McLachlan, G.J., Peel, D.: Robust mixture modelling using the t distribution. Stat. Comput. 10(4), 339–348 (2000b) CrossRefGoogle Scholar
  30. Nadarajah, S., Dey, D.K.: Multitude of multivariate t distributions. J. Theor. Appl. Stat. 39(2), 149–181 (2005) zbMATHMathSciNetGoogle Scholar
  31. Nadarajah, S., Kotz, S.: Multitude of bivariate t distributions. J. Theor. Appl. Stat. 38(6), 527–539 (2004) zbMATHMathSciNetGoogle Scholar
  32. Shaw, W.T., Lee, K.T.A.: Bivariate Student distributions with variable marginal degrees of freedom and independence. J. Multivar. Anal. 99(6), 1276–1287 (2008) CrossRefzbMATHMathSciNetGoogle Scholar
  33. Shephard, N.: From characteristic function to distribution function: a simple framework for the theory. Econom. Theory 7(4), 519–529 (1991) CrossRefMathSciNetGoogle Scholar
  34. Shoham, S.: Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recognit. 35(5), 1127–1142 (2002) CrossRefzbMATHGoogle Scholar
  35. Witkovský, V.: On the exact computation of the density and of the quantiles of linear combinations of t and F random variables. J. Stat. Plan. Inference 94(1), 1–13 (2001) CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.INRIALaboratoire Jean Kuntzman, Mistis teamSaint-Ismier CedexFrance

Personalised recommendations