Advances in Data Analysis and Classification

, Volume 13, Issue 1, pp 145–173 | Cite as

Finite mixtures, projection pursuit and tensor rank: a triangulation

  • Nicola LoperfidoEmail author
Regular Article


Finite mixtures of multivariate distributions play a fundamental role in model-based clustering. However, they pose several problems, especially in the presence of many irrelevant variables. Dimension reduction methods, such as projection pursuit, are commonly used to address these problems. In this paper, we use skewness-maximizing projections to recover the subspace which optimally separates the cluster means. Skewness might then be removed in order to search for other potentially interesting data structures or to perform skewness-sensitive statistical analyses, such as the Hotelling’s \( T^{2}\) test. Our approach is algebraic in nature and deals with the symmetric tensor rank of the third multivariate cumulant. We also derive closed-form expressions for the symmetric tensor rank of the third cumulants of several multivariate mixture models, including mixtures of skew-normal distributions and mixtures of two symmetric components with proportional covariance matrices. Theoretical results in this paper shed some light on the connection between the estimated number of mixture components and their skewness.


Finite mixture Linear discriminant function Projection pursuit Skewness Symmetrization Tensor rank 

Mathematics Subject Classification

46B28 62H30 



The author would like to thank an anonymous Associate Editor and two anonymous Reviewers for their care in handling this paper and for their precious comments which greatly helped in increasing its quality.


  1. Adcock C, Eling M, Loperfido N (2015) Skewed distributions in finance and actuarial science: a review. Eur J Finance 21:1253–1281CrossRefGoogle Scholar
  2. Ambagaspitiya RS (1999) On the distributions of two classes of correlated aggregate claims. Insur Math Econ 24:301–308MathSciNetCrossRefzbMATHGoogle Scholar
  3. Arellano-Valle RB, Genton MG, Loschi RH (2009) Shape mixtures of multivariate skew-normal distributions. J Multivar Anal 100:91–101MathSciNetCrossRefzbMATHGoogle Scholar
  4. Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\) distribution. J R Stat Soc B 65:367–389MathSciNetCrossRefzbMATHGoogle Scholar
  5. Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-t and related distributions. Int Stat Rev 76:106–129CrossRefzbMATHGoogle Scholar
  6. Bartoletti S, Loperfido N (2010) Modelling air pollution data by the skew-normal distribution. Stoch Environ Res Risk Assess 24:513–517CrossRefGoogle Scholar
  7. Blough DK (1989) Multivariate symmetry and asymmetry. Inst Stat Math 24:513–517Google Scholar
  8. Bolton RJ, Krzanowski WJ (2003) Projection pursuit clustering for exploratory data analysis. J Comput Graph Stat 12:121–142MathSciNetCrossRefGoogle Scholar
  9. Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78MathSciNetCrossRefzbMATHGoogle Scholar
  10. Branco MD, Dey DK (2001) A general class of skew-elliptical distributions. J Multivar Anal 79:99–113MathSciNetCrossRefzbMATHGoogle Scholar
  11. Comon P (2014) Tensors: a brief introduction. IEEE Sig Process Mag Inst Electr Electron Eng 31:44–53Google Scholar
  12. Comon P, Golub G, Lim L-H, Mourrain B (2008) Symmetric tensors and symmetric tensor rank. SIAM J Matrix Anal Appl 30:1254–1279MathSciNetCrossRefzbMATHGoogle Scholar
  13. Fraley C, Raftery Adrian E, Scrucca L (2017) mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. R package version 5.3
  14. Franceschini C, Loperfido N (2017a) MaxSkew: skewness-based projection pursuit. R package version 1.1
  15. Franceschini C, Loperfido N (2017b) MultiSkew: measures, tests and removes multivariate skewness. R package version 1.1.1
  16. Friedman J (1987) Exploratory projection pursuit. J. Am Stat Assoc 82:249–266MathSciNetCrossRefzbMATHGoogle Scholar
  17. Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput Ser C 23:881–890CrossRefzbMATHGoogle Scholar
  18. Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew\(-t\) distributions. Biostatistics 11:317–336CrossRefGoogle Scholar
  19. Grasman RPPP, Huizenga HM, Geurts HM (2010) Departure from normality in multivariate normative comparison: the Cramé r alternative for Hotelling’s \(T^{2}\). Neuropsychologia 48:1510–1516CrossRefGoogle Scholar
  20. Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13:930–945MathSciNetCrossRefGoogle Scholar
  21. Hennig C (2005) A method for visual cluster validation. In: Weihs C, Gaul W (eds) Classification—the ubiquitous challenge. Springer, Heidelberg, pp 153–160CrossRefGoogle Scholar
  22. Hui G, Lindsay BG (2010) Projection pursuit via white noise matrices. Sankhya B 72:123–153MathSciNetCrossRefzbMATHGoogle Scholar
  23. Jondeau E, Rockinger M (2006) Optimal portfolio allocation under higher moments. Eur Financ Manag 12:29–55CrossRefGoogle Scholar
  24. Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590MathSciNetCrossRefzbMATHGoogle Scholar
  25. Kim H-M, Mallick BK (2003) Moments of random vectors with skew \(t\) distribution and their quadratic forms. Stat Probab Lett 63:417–423MathSciNetCrossRefzbMATHGoogle Scholar
  26. Landsberg JM, Michalek M (2017) On the geometry of border rank decompositions for matrix multiplication and other tensors with symmetry. SIAM J Appl Algebra Geom 1:2–19MathSciNetCrossRefzbMATHGoogle Scholar
  27. Lee S, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454 (with discussion)MathSciNetCrossRefzbMATHGoogle Scholar
  28. Lin XS (2004) Compound distributions. In: Encyclopedia of actuarial science, vol 1. Wiley, pp 314–317Google Scholar
  29. Lindsay BG, Yao W (2012) Fisher information matrix: a tool for dimension reduction, projection pursuit, independent component analysis, and more. Can J Stat 40:712–730MathSciNetCrossRefzbMATHGoogle Scholar
  30. Loperfido N (2004) Generalized skew-normal distributions. Skew-elliptical distributions and their applications: a journey beyond normality. CRC, Boca Raton, pp 65–80Google Scholar
  31. Loperfido N (2013) Skewness and the linear discriminant function. Stat Probab Lett 83:93–99MathSciNetCrossRefzbMATHGoogle Scholar
  32. Loperfido N (2014) Linear transformations to symmetry. J Multivar Anal 129:186–192MathSciNetCrossRefzbMATHGoogle Scholar
  33. Loperfido N (2015a) Vector-valued skewness for model-based clustering. Stat Probab Lett 99:230–237MathSciNetCrossRefzbMATHGoogle Scholar
  34. Loperfido N (2015b) Singular value decomposition of the third multivariate moment. Linear Algebra Appl 473:202–216MathSciNetCrossRefzbMATHGoogle Scholar
  35. Loperfido N (2018) Skewness-based projection pursuit: a computational approach. Comput Stat Data Anal 120:42–57MathSciNetCrossRefzbMATHGoogle Scholar
  36. Loperfido N, Mazur S, Podgorski K (2018) Third cumulant for multivariate aggregate claims models. Scand Actuar J 2018:109–128MathSciNetCrossRefzbMATHGoogle Scholar
  37. Mardia K (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530MathSciNetCrossRefzbMATHGoogle Scholar
  38. McNicholas PD (2016) Model-based clustering. J Class 33:331–373MathSciNetCrossRefzbMATHGoogle Scholar
  39. Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116MathSciNetCrossRefzbMATHGoogle Scholar
  40. Miettinen J, Taskinen S, Nordhausen K, Oja H (2015) Fourth moments and independent component analysis. Stat Sci 3:372–390MathSciNetCrossRefzbMATHGoogle Scholar
  41. Mòri T, Rohatgi V, Székely G (1993) On multivariate skewness and kurtosis. Theory Probab Appl 38:547–551MathSciNetCrossRefzbMATHGoogle Scholar
  42. Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7:321–338MathSciNetCrossRefzbMATHGoogle Scholar
  43. Oeding L, Ottaviani G (2013) Eigenvectors of tensors and algorithms for Waring decomposition. J Symb Comput 54:9–35MathSciNetCrossRefzbMATHGoogle Scholar
  44. Paajarvi P, Leblanc J (2004) Skewness maximization for impulsive sources in blind deconvolution. In: Proceedings of the 6th Nordic signal processing symposium—NORSIG, Espoo, FinlandGoogle Scholar
  45. Peña D, Prieto FJ (2001) Cluster identification using projections. J Am Stat Assoc 96:1433–1445MathSciNetCrossRefzbMATHGoogle Scholar
  46. Rao CR, Rao MB (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Co. Pte. Ltd, SingaporeCrossRefzbMATHGoogle Scholar
  47. Sakata T, Sumi T, Miyazaki M (2016) Algebraic and computational aspects of real tensor ranks. Springer, TokyoCrossRefzbMATHGoogle Scholar
  48. Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20:471–484MathSciNetCrossRefGoogle Scholar
  49. Scrucca L (2014) Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8:147–165MathSciNetCrossRefGoogle Scholar
  50. Tarpey T, Yun D, Petkova E (2009) Model misspecification: Finite mixture or homogeneous? Stat Model 8:199–218MathSciNetCrossRefGoogle Scholar
  51. Tyler DE, Critchley F, Dümbgen L, Oja H (2009) Invariant co-ordinate selection. J R Stat Soc B 71:1–27 (with discussion)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dipartimento di Economia, Società e PoliticaUniversità degli Studi di Urbino “Carlo Bo”UrbinoItaly

Personalised recommendations