Abstract
The mixture of factor analyzers model is extended to variance-gamma mixtures to facilitate flexible clustering of high-dimensional data. The formation of the variance-gamma distribution utilized is a special and limiting case of the generalized hyperbolic distribution. Parameter estimation for these mixtures is carried out via an alternating expectation-conditional maximization algorithm, and relies on convenient expressions for expected values for the generalized inverse Gaussian distribution. The Bayesian information criterion is used to select the number of latent factors. The mixture of variance-gamma factor analyzers model is illustrated on a well-known breast cancer data set. Finally, the place of variance-gamma mixtures within the growing body of literature on non-Gaussian mixtures is considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitken, A.C.: A series formula for the roots of algebraic and transcendental equations. Proc. R. Soc. Edinb. 45, 14–22 (1926)
Andrews, J.L., McNicholas, P.D.: Extending mixtures of multivariate t-factor analyzers. Stat. Comput. 21 (3), 361–373 (2011)
Andrews, J.L., McNicholas, P.D.: Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. J. Stat. Plann. Inf. 141 (4), 1479–1486 (2011)
Andrews, J.L., McNicholas, P.D.: Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions: the tEIGEN family. Stat. Comput. 22 (5), 1021–1029 (2012)
Andrews, J.L., McNicholas, P.D., Subedi, S.: Model-based classification via mixtures of multivariate t-distributions. Comput. Stat. Data Anal. 55 (1), 520–529 (2011)
Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 38, 309–311 (1977)
Bhattacharya, S., McNicholas, P.D.: A LASSO-penalized BIC for mixture model selection. Adv. Data Anal. Classif. 8 (1), 45–61 (2014)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22 (7), 719–725 (2000)
Blæsild, P.: The shape of the generalized inverse Gaussian and hyperbolic distributions. Research Report 37, Department of Theoretical Statistics, Aarhus University, Denmark (1978)
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46, 373–388 (1994)
Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. Can. J. Stat. 43 (2), 176–198 (2015)
Browne, R.P., McNicholas, P.D., Sparling, M.D.: Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans. Pattern Anal. Mach. Intell. 34 (4), 814–817 (2012)
Dang, U.J., Browne, R.P., McNicholas, P.D.: Mixtures of multivariate power exponential distributions. Biometrics 71 (4), 1081–1089 (2015)
Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. 93, 294–302 (1998)
Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. Ser. C 55 (1), 1–14 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39 (1) 1–38 (1977)
Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric Laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36 (6), 1149–1157 (2014)
Franczak, B.C., Tortora, C., Browne, R.P., McNicholas, P.D.: Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recogn. Lett. 58 (1), 69–76 (2015)
Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University Of Toronto, Toronto (1997)
Good, J.I.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–260 (1953)
Halgreen, C.: Self-decomposability of the generalized inverse Gaussian and hyperbolic distributions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 47, 13–18 (1979)
Hastie, T., Tibshirani, R.: Discriminant analysis by Gaussian mixtures. J. R. Stat. Soc. Ser. B 58 (1), 155–176 (1996)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 (1), 193–218 (1985)
Jørgensen, B.: Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer, New York (1982)
Karlis, D., Meligkotsidou, L.: Finite mixtures of multivariate Poisson distributions with application. J. Stat. Plan. Inf. 137 (6), 1942–1960 (2007)
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90 (430), 773–795 (1995)
Kass, R.E., Wasserman, L.: A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Am. Stat. Assoc. 90 (431), 928–934 (1995)
Kotz, S., Kozubowski, T.J., Podgorski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, Boston (2001)
Lawley, D.N., Maxwell, A.E.: Factor analysis as a statistical method. J. R. Stat. Soc. Ser. D 12 (3), 209–229 (1962)
Lee, S.X., McLachlan, G.J.: On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif. 7 (3), 241–266 (2013)
Lichman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml (2013)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20 (3), 343–356 (2010)
Lin, T.I., McNicholas, P.D., Hsiu, J.H.: Capturing patterns via parsimonious t mixture models. Stat. Probab. Lett. 88, 80–87 (2014)
Lindsay, B.G.: Mixture models: Theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5, Institute of Mathematical Statistics, Hayward, CA (1995)
Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)
McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992)
McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufmann, SF, pp. 599–606 (2000)
McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton (2005)
McNicholas, P.D.: Model-based classification using latent Gaussian mixture models. J. Stat. Plan. Inf. 140 (5), 1175–1181 (2010)
McNicholas, P.D.: Mixture Model-Based Classification. Chapman & Hall/CRC Press, Boca Raton (2016)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18 (3), 285–296 (2008)
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54 (3), 711–723 (2010)
Meng, X.L., van Dyk, D.: The EM algorithm—an old folk song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B 59 (3), 511–567 (1997)
Murray, P.M., Browne, R.B., McNicholas, P.D.: Mixtures of skew-t factor analyzers. Comput. Stat. Data Anal. 77, 326–335 (2014)
Murray, P.M., McNicholas, P.D., Browne, R.B.: A mixture of common skew-t factor analyzers. Stat 3 (1), 68–82 (2014)
O’Hagan, A., Murphy, T.B., Gormley, I.C., McNicholas, P.D., Karlis, D.: Clustering with the multivariate normal inverse Gaussian distribution. Comput. Stat. Data Anal. 93, 18–30 (2016)
Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10 (4), 339–348 (2000)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 (336), 846–850 (1971)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Steane, M.A., McNicholas, P.D., Yada, R.: Model-based classification via mixtures of multivariate t-factor analyzers. Commun. Stat. Simul. Comput. 41 (4), 510–523 (2012)
Subedi, S., McNicholas, P.D.: Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv. Data Anal. Classif. 8 (2), 167–193 (2014)
Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. (2015, to appear). doi: 10.1007/s11634-015-0204-z
Vrbik, I., McNicholas, P.D.: Analytic calculations for the EM algorithm for multivariate skew-mixture models. Stat. Probab. Lett. 82 (6), 1169–1174 (2012)
Vrbik, I., McNicholas, P.D.: Parsimonious skew mixture models for model-based clustering and classification. Comput. Stat. Data Anal. 71, 196–210 (2014)
Vrbik, I., McNicholas, P.D.: Fractionally-supervised classification. J. Classif. 32 (3), 359–381 (2015)
Woodbury, M.A.: Inverting modified matrices. Statistical Research Group, Memorandum Report 42. Princeton University, Princeton, NJ (1950)
Acknowledgements
The authors are grateful to an anonymous reviewer for providing helpful comments. This work is supported by an Alexander Graham Bell Scholarship (CGS-D) from the Natural Sciences and Engineering Research Council of Canada (S.M. McNicholas).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
See Fig. 3.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A Mixture of Variance-Gamma Factor Analyzers. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-41573-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)