Abstract
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.
Similar content being viewed by others
References
Aas, K., Hobaek Hoff, I., Dimakos, X.: Risk estimation using the multivariate normal inverse Gaussian distribution. J. Risk 8(2), 39–60 (2005)
Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Barndorff-Nielsen, O.E.: Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Stat. 24(1), 1–13 (1997)
Barndorff-Nielsen, O.E., Prause, K.: Apparent scaling. Finance Stoch. 5(1), 103–113 (2001)
Barndorff-Nielsen, O., Kent, J., Sørensen, M.: Normal variance-mean mixtures and z distributions. Int. Stat. Rev. 50(2), 145–159 (1982)
Bechtel, Y.C., Bonaiti-Pellik, C., Poisson, N., Magnette, J., Bechtel, P.R.: A population and family study of n-acetyltransferase using caffeine urinary metabolites. Clin. Pharmacol. Ther. 54, 134–141 (1993)
Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. J. Classif. 20, 263–286 (2003)
Gupta, A.K.: Multivariate skew t-distribution. Statistics 37(4), 359–363 (2003)
Gutierrez, R.G., Carroll, R.J., Wang, N., Lee, G.-H., Taylor, B.H.: Analysis of tomato root initiation using a normal mixture distribution. Biometrics 51, 1461–1468 (1995)
Jorgensen, B.: Statistical Properties of the Generalized Inverse-Gaussian Distribution. Lecture Notes in Statistics. Spinger, New York (1992)
Karlis, D.: An EM type algorithm for maximum likelihood estimation of the normal-inverse Gaussian distribution. Stat. Probab. Lett. 57(1), 43–52 (2002)
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17(2), 81–92 (2007a)
Lin, T.I., Lee, J.C., Yen, S.Y., Shu, Y.: Finite mixture modelling using the skew normal distribution. Stat. Sinica 17, 909–927 (2007b)
MacLean, C., Morton, N., Elston, R., Yee, S.: Skewness in commingling distributions. Biometrics 32, 695–699 (1976)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
McLachlan, G.J., Bean, R.W., Jones, L.B.-T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51(11), 5327–5338 (2007)
Meng, X.-L., Van Dyk, D.: The EM algorithm—an old folk song sung to a fast new tune. J. R. Stat. Soc. Ser. B 59(3), 511–567 (1997)
Peel, D., McLachlan, G.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)
Protassov, R.S.: EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions with fixed λ. Stat. Comput. 14(1), 67–77 (2004)
Seshadri, V.: The Inverse Gaussian Distribution. Oxford Science Publications. Clarendon/Oxford University Press, New York (1993)
Titterington, D., Makov, U., Smith, A.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karlis, D., Santourian, A. Model-based clustering with non-elliptically contoured distributions. Stat Comput 19, 73–83 (2009). https://doi.org/10.1007/s11222-008-9072-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-008-9072-0