Skip to main content
Log in

Mixtures of Gaussian copula factor analyzers for clustering high dimensional data

  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Mixtures of factor analyzers is a useful model-based clustering method which can avoid the curse of dimensionality in high-dimensional clustering. However, this approach is sensitive to both diverse non-normalities of marginal variables and outliers, which are commonly observed in multivariate experiments. We propose mixtures of Gaussian copula factor analyzers (MGCFA) for clustering high-dimensional clustering. This model has two advantages; (1) it allows different marginal distributions to facilitate fitting flexibility of the mixture model, (2) it can avoid the curse of dimensionality by embedding the factor-analytic structure in the component-correlation matrices of the mixture distribution.

An EM algorithm is developed for the fitting of MGCFA. The proposed method is free of the curse of dimensionality and allows any parametric marginal distribution which fits best to the data. It is applied to both synthetic data and a microarray gene expression data for clustering and shows its better performance over several existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrews, J. L., & McNicholas, P. D. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.

    Article  MathSciNet  Google Scholar 

  • Baek, J., & McLachlan, G. J. (2011). Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics, 27(9), 1269–1276.

    Article  Google Scholar 

  • Baek, J., McLachlan, G. J., & Flack, L. (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1298–1309.

    Article  Google Scholar 

  • Bhattacharjee, A., Richards, W., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E. J., Lander, E. S., Wong, W., Johnson, B. E., Golub, T. R., Sugarbaker, D. J., & Meyerson, M. (2001). Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences, 98(24), 13790–13795.

    Article  Google Scholar 

  • Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. The Canadian Journal of Statistics, 43(2), 176–198.

    Article  MathSciNet  Google Scholar 

  • Di Lascio, F. M. L., & Giannerini, S. (2012). A copula-based algorithm for discovering patterns of dependent observations. Journal of Classification, 29(1), 50–75.

    Article  MathSciNet  Google Scholar 

  • Fackler, P. L. (2005). Notes on matrix calculus. North Carolina State University.

    Google Scholar 

  • Franczak, B. C., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.

    Article  Google Scholar 

  • Galimberti, G., Montanari, A., & Viroli, C. (2009). Penalized factor mixture analysis for variable selection in clustered data. Computational Statistics & Data Analysis, 53(12), 4301–4310.

    Article  MathSciNet  Google Scholar 

  • Ghahramani, Z., & Hinton, G. E. (1997). The em algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, Toronto: The University of Toronto.

    Google Scholar 

  • Henderson, H. V., & Searle, S. R. (1981). The vec-permutation matrix, the vec operator and kronecker products: a review. Linear and Multilinear Algebra, 9(4), 271–288.

    Article  MathSciNet  Google Scholar 

  • Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–438.

    Article  MathSciNet  Google Scholar 

  • Jajuga, K., & Papla, D. (2006). Copula functions in model based clustering. Data and Information Analysis to Knowledge Engineering, 60, 6–613.

    Google Scholar 

  • Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.

    Article  MathSciNet  Google Scholar 

  • Kosmidis, I., & Karlis, D. (2016). Model-based clustering using copulas with applications. Statistics and Computing, 26(5), 1079–1099.

    Article  MathSciNet  Google Scholar 

  • Lee, S. X., & Mclachlan, G. J. (2016). Finite mixtures of canonical fundamental skew t-distributions. Statistics and Computing, 26(3), 573–589.

    Article  MathSciNet  Google Scholar 

  • Lin, T. I., Lee, J. C., & Yen, S. Y. (2007). Finite mixture modelling using the skew normal distribution. Statistica Sinica, 17(3), 909–927.

    MathSciNet  MATH  Google Scholar 

  • Lin, T. I., Mclachlan, G. J., & Lee, S. X. (2016). Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Journal of Multivariate Analysis, 143, 398–413.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J., Bean, R. W., & Ben-Tovim Jones, L. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics & Data Analysis, 51(11), 5327–5338.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J., & Peel, D. (2000). Finite Mixture Models. Wiley.

    Book  Google Scholar 

  • McNicholas, S. M., McNicholas, P. D., & Browne, R. P. (2013). Mixtures of variance-gamma distributions. Arxiv preprint arXiv:13092695.

    MATH  Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    Article  MathSciNet  Google Scholar 

  • Montanari, A., & Viroli, C. (2010). A skew-normal factor model for the analysis of student satisfaction towards university courses. Journal of Applied Statistics, 37(3), 473–487.

    Article  MathSciNet  Google Scholar 

  • Murray, P. M., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics & Data Analysis, 77, 326–335.

    Article  MathSciNet  Google Scholar 

  • Murray, P. M., Browne, R. P., & McNicholas, P. D. (2017). Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. Journal of Multivariate Analysis, 161, 141–156.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  Google Scholar 

  • Souto, M., Costa, I., Araujo, D., Ludermir, T., & Schliep, A. (2008). Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9, 497.

    Article  Google Scholar 

  • Tortora, C., McNicholas, P. D., & Browne, R. P. (2016). A mixture of generalized hyperbolic factor analyzers. Advances in Data Analysis and Classification, 10(4), 423–440.

    Article  MathSciNet  Google Scholar 

  • VracL, M., Billard, L., Diday, E., & Chédin, A. (2012). Copula analysis of mixture models. Computational Statistics, 27(3), 427–457.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jangsun Baek.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Baek, J. Mixtures of Gaussian copula factor analyzers for clustering high dimensional data. J. Korean Stat. Soc. 48, 480–492 (2019). https://doi.org/10.1016/j.jkss.2018.12.001

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1016/j.jkss.2018.12.001

AMS 2000 subject classifications

Keywords

Navigation