Skip to main content

A Mixture of Variance-Gamma Factor Analyzers

  • Chapter
  • First Online:
Big and Complex Data Analysis

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

The mixture of factor analyzers model is extended to variance-gamma mixtures to facilitate flexible clustering of high-dimensional data. The formation of the variance-gamma distribution utilized is a special and limiting case of the generalized hyperbolic distribution. Parameter estimation for these mixtures is carried out via an alternating expectation-conditional maximization algorithm, and relies on convenient expressions for expected values for the generalized inverse Gaussian distribution. The Bayesian information criterion is used to select the number of latent factors. The mixture of variance-gamma factor analyzers model is illustrated on a well-known breast cancer data set. Finally, the place of variance-gamma mixtures within the growing body of literature on non-Gaussian mixtures is considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aitken, A.C.: A series formula for the roots of algebraic and transcendental equations. Proc. R. Soc. Edinb. 45, 14–22 (1926)

    Article  MATH  Google Scholar 

  2. Andrews, J.L., McNicholas, P.D.: Extending mixtures of multivariate t-factor analyzers. Stat. Comput. 21 (3), 361–373 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andrews, J.L., McNicholas, P.D.: Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. J. Stat. Plann. Inf. 141 (4), 1479–1486 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Andrews, J.L., McNicholas, P.D.: Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions: the tEIGEN family. Stat. Comput. 22 (5), 1021–1029 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Andrews, J.L., McNicholas, P.D., Subedi, S.: Model-based classification via mixtures of multivariate t-distributions. Comput. Stat. Data Anal. 55 (1), 520–529 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 38, 309–311 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bhattacharya, S., McNicholas, P.D.: A LASSO-penalized BIC for mixture model selection. Adv. Data Anal. Classif. 8 (1), 45–61 (2014)

    Article  MathSciNet  Google Scholar 

  8. Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22 (7), 719–725 (2000)

    Article  Google Scholar 

  9. Blæsild, P.: The shape of the generalized inverse Gaussian and hyperbolic distributions. Research Report 37, Department of Theoretical Statistics, Aarhus University, Denmark (1978)

    Google Scholar 

  10. Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46, 373–388 (1994)

    Article  MATH  Google Scholar 

  11. Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. Can. J. Stat. 43 (2), 176–198 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Browne, R.P., McNicholas, P.D., Sparling, M.D.: Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans. Pattern Anal. Mach. Intell. 34 (4), 814–817 (2012)

    Article  Google Scholar 

  14. Dang, U.J., Browne, R.P., McNicholas, P.D.: Mixtures of multivariate power exponential distributions. Biometrics 71 (4), 1081–1089 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. 93, 294–302 (1998)

    Article  MATH  Google Scholar 

  16. Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. Ser. C 55 (1), 1–14 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39 (1) 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  18. Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric Laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36 (6), 1149–1157 (2014)

    Article  Google Scholar 

  19. Franczak, B.C., Tortora, C., Browne, R.P., McNicholas, P.D.: Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recogn. Lett. 58 (1), 69–76 (2015)

    Article  Google Scholar 

  20. Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University Of Toronto, Toronto (1997)

    Google Scholar 

  21. Good, J.I.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–260 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  22. Halgreen, C.: Self-decomposability of the generalized inverse Gaussian and hyperbolic distributions. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 47, 13–18 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  23. Hastie, T., Tibshirani, R.: Discriminant analysis by Gaussian mixtures. J. R. Stat. Soc. Ser. B 58 (1), 155–176 (1996)

    MathSciNet  MATH  Google Scholar 

  24. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2 (1), 193–218 (1985)

    Article  MATH  Google Scholar 

  25. Jørgensen, B.: Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer, New York (1982)

    Book  MATH  Google Scholar 

  26. Karlis, D., Meligkotsidou, L.: Finite mixtures of multivariate Poisson distributions with application. J. Stat. Plan. Inf. 137 (6), 1942–1960 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90 (430), 773–795 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kass, R.E., Wasserman, L.: A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Am. Stat. Assoc. 90 (431), 928–934 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  29. Kotz, S., Kozubowski, T.J., Podgorski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhauser, Boston (2001)

    Book  MATH  Google Scholar 

  30. Lawley, D.N., Maxwell, A.E.: Factor analysis as a statistical method. J. R. Stat. Soc. Ser. D 12 (3), 209–229 (1962)

    MATH  Google Scholar 

  31. Lee, S.X., McLachlan, G.J.: On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif. 7 (3), 241–266 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lichman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml (2013)

  33. Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  34. Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20 (3), 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  35. Lin, T.I., McNicholas, P.D., Hsiu, J.H.: Capturing patterns via parsimonious t mixture models. Stat. Probab. Lett. 88, 80–87 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  36. Lindsay, B.G.: Mixture models: Theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5, Institute of Mathematical Statistics, Hayward, CA (1995)

    Google Scholar 

  37. Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)

    MathSciNet  MATH  Google Scholar 

  38. McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992)

    Book  MATH  Google Scholar 

  39. McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufmann, SF, pp. 599–606 (2000)

    Google Scholar 

  40. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, Princeton (2005)

    MATH  Google Scholar 

  41. McNicholas, P.D.: Model-based classification using latent Gaussian mixture models. J. Stat. Plan. Inf. 140 (5), 1175–1181 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  42. McNicholas, P.D.: Mixture Model-Based Classification. Chapman & Hall/CRC Press, Boca Raton (2016)

    Book  Google Scholar 

  43. McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18 (3), 285–296 (2008)

    Article  MathSciNet  Google Scholar 

  44. McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54 (3), 711–723 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  45. Meng, X.L., van Dyk, D.: The EM algorithm—an old folk song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B 59 (3), 511–567 (1997)

    Article  MATH  Google Scholar 

  46. Murray, P.M., Browne, R.B., McNicholas, P.D.: Mixtures of skew-t factor analyzers. Comput. Stat. Data Anal. 77, 326–335 (2014)

    Article  MathSciNet  Google Scholar 

  47. Murray, P.M., McNicholas, P.D., Browne, R.B.: A mixture of common skew-t factor analyzers. Stat 3 (1), 68–82 (2014)

    Article  Google Scholar 

  48. O’Hagan, A., Murphy, T.B., Gormley, I.C., McNicholas, P.D., Karlis, D.: Clustering with the multivariate normal inverse Gaussian distribution. Comput. Stat. Data Anal. 93, 18–30 (2016)

    Article  MathSciNet  Google Scholar 

  49. Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10 (4), 339–348 (2000)

    Article  Google Scholar 

  50. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)

    Google Scholar 

  51. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66 (336), 846–850 (1971)

    Article  Google Scholar 

  52. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  53. Steane, M.A., McNicholas, P.D., Yada, R.: Model-based classification via mixtures of multivariate t-factor analyzers. Commun. Stat. Simul. Comput. 41 (4), 510–523 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  54. Subedi, S., McNicholas, P.D.: Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv. Data Anal. Classif. 8 (2), 167–193 (2014)

    Article  MathSciNet  Google Scholar 

  55. Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif. (2015, to appear). doi: 10.1007/s11634-015-0204-z

    Google Scholar 

  56. Vrbik, I., McNicholas, P.D.: Analytic calculations for the EM algorithm for multivariate skew-mixture models. Stat. Probab. Lett. 82 (6), 1169–1174 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  57. Vrbik, I., McNicholas, P.D.: Parsimonious skew mixture models for model-based clustering and classification. Comput. Stat. Data Anal. 71, 196–210 (2014)

    Article  MathSciNet  Google Scholar 

  58. Vrbik, I., McNicholas, P.D.: Fractionally-supervised classification. J. Classif. 32 (3), 359–381 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  59. Woodbury, M.A.: Inverting modified matrices. Statistical Research Group, Memorandum Report 42. Princeton University, Princeton, NJ (1950)

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to an anonymous reviewer for providing helpful comments. This work is supported by an Alexander Graham Bell Scholarship (CGS-D) from the Natural Sciences and Engineering Research Council of Canada (S.M. McNicholas).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul D. McNicholas .

Editor information

Editors and Affiliations

Appendix

Appendix

See Fig. 3.

Fig. 3
figure 3

Plot of BIC value versus number of latent factors q for the MVGFA model fitted to the Wisconsin breast cancer data, focusing on q ∈ [16, 22]

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A Mixture of Variance-Gamma Factor Analyzers. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_18

Download citation

Publish with us

Policies and ethics