Skip to main content
Log in

Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The mixture of factor analyzers (MFA) model, by reducing the number of free parameters through its factor-analytic representation of the component covariance matrices, is an important statistical model to identify hidden or latent groups in high dimensional data. Recent approaches to extend the approach to skewed data or skewness in the latent groups have been examined in a frequentist setting where there are some known computational limitations. For these reasons we consider a Bayesian approach to the restricted skew-normal mixtures of factor analysis MFA model. We examine the performance and flexibility of the approach on real datasets and illustrate some of the computational advantages in a missing data setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ando T (2009) Bayesian factor analysis with fat-tailed factors and its exact marginal likelihood. J Multivar Anal 100(8):1717–1726

    Article  MathSciNet  MATH  Google Scholar 

  • Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew-normal distributions. Scand J Stat 33:561–574

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  MATH  Google Scholar 

  • Azzalini A (2014) The skew-normal and related families. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61:579–602

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Dalla-Vale A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    Article  MathSciNet  MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on the scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941

    Article  MathSciNet  MATH  Google Scholar 

  • Bhattacharya A, Dunson DB (2011) Sparse Bayesian infinite factor models. Biometrika 98(2):291–306

    Article  MathSciNet  MATH  Google Scholar 

  • Bishop CM (1999) Bayesian PCA. In: Kearns MS, Solla SA, Cohn DA (eds) Advances in neural information processing systems, vol 11. MIT Press, Cambridge, pp 382–388

    Google Scholar 

  • Carlin BP, Louis TA (2011) Bayesian methods for data analysis, 3rd edn. Chapman & Hall, CRC Press, Boca Raton

    MATH  Google Scholar 

  • Carvalho CM, Chang J, Lucas JE, Nevins JR, Wang Q, West M (2008) High-dimensional sparse factor modeling: applications in gene expression genomics. J Am Stat Assoc 103(484):1438–1456

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Hurn M, Robert CP (2000) Computational and inferential difficulties with mixture posterior distributions. J Am Stat Assoc 95:957–970

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Forbes F, Robert CP, Titterington DM (2006) Deviance information criteria for missing data models. Bayesian Anal 1:651–674

    Article  MathSciNet  MATH  Google Scholar 

  • Charytanowicz M, Niewcazs J, Kulczycki P, Lukasik S, Zak S (2010) A complete gradient clustering algorithm for features analysis of x-ray images. In: Pietka E, Kawa J (eds) Information technologies in biomedicine. Springer, Berlin, pp 15–24

    Chapter  Google Scholar 

  • Chen M, Silva J, Paisley J, Wang C, Dunson D, Carin L (2010) Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds. IEEE Trans Signal Process 58(12):6140–6155

    Article  MathSciNet  MATH  Google Scholar 

  • Chen M, Zaas A, Woods C, Ginsburg GS, Lucas J, Dunson D, Carin L (2011) Predicting viral infection from high-dimensional biomarker trajectories. J Am Stat Assoc 106:1259–1279

    Article  MathSciNet  MATH  Google Scholar 

  • Conti G, Frühwirth-Schnatter S, Heckman JJ, Piatek R (2014) Bayesian exploratory factor analysis. J Econom 183(1):31–57

    Article  MathSciNet  MATH  Google Scholar 

  • Fokoué E, Titterington DM (2003) Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation. Mach Learn 50:73–94

    Article  MATH  Google Scholar 

  • Frühwirth-Schnatter S, Lopes HF (2012) Parsimonious Bayesian factor analysis when the number of factors is unknown. Unpublished Technical Report

  • Gelfand AE, Smith AFM (1990) Sampling based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences (with discussion). Stat Sci 7:457–511

    Article  MATH  Google Scholar 

  • Ghahramani Z, Beal MJ (2000) Variational inference for Bayesian mixtures of factor analysers. Adv Neural Inf Process Syst 12:449–455

    Google Scholar 

  • Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Technical Report No. CRG-TR-96-1. University of Toronto, Department of Computer Science, Toronto

  • Ghosh J, Dunson DB (2009) Default prior distributions and efficient posterior computation in Bayesian factor analysis. J Comput Graph Stat 18(2):306–320

    Article  MathSciNet  Google Scholar 

  • Hinton GE, Dayan P, Revow M (1997) Modeling the manifolds of images of handwritten digits. IEEE Trans Neural Netw 8:65–74

    Article  Google Scholar 

  • Hoseinzadeh A, Maleki M, Khodadadi Z, Contreras-Reyes JE (2018) The Skew-Reflected-Gompertz distribution for analyzing the symmetric and asymmetric data. J Comput Appl Math 349:132–141

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  MATH  Google Scholar 

  • Knowles D, Ghahramani Z (2007) Infinite sparse factor analysis and infinite independent components analysis. In: 7th international conference on independent component analysis and signal separation. Springer, Berlin, pp 381–388

  • Lee SX, McLachlan GJ (2013a) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22(4):427–454

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SX, McLachlan GJ (2013b) On mixtures of skew normal and skew t distributions. Adv Data Anal Classif 7(3):241–266

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SY, Xia YM (2008a) A robust Bayesian approach for structural equation models with missing data. Psychometrika 73:343–364

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SY, Xia YM (2008b) Semiparametric Bayesian analysis of structural equation models with fixed covariates. Stat Med 27:2341–2360

    Article  MathSciNet  Google Scholar 

  • Leung D, Drton M (2016) Order-invariant prior specification in Bayesian factor analysis. Stat Probab Lett 111:60–66

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Lee JC, Yen SY (2007) Finite mixture modeling using the skew-normal distribution. Stat Sin 17:909–927

    MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 4:41–67

    MathSciNet  MATH  Google Scholar 

  • Maleki M, Arellano-Valle RB (2017) Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions. J Stat Comput Simul 87(6):1061–1083

    Article  MathSciNet  Google Scholar 

  • Maleki M, Mahmoudi MR (2017) Two-pieces location-scale distributions based on scale mixtures of normal family. Commun Stat Theory Methods 46(24):12356–12369

    Article  MathSciNet  MATH  Google Scholar 

  • Maleki M, Wraith D, Arellano-Valle RB (2018a) Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput. https://doi.org/10.1007/s11222-018-9815-5

    Article  MATH  Google Scholar 

  • Maleki M, Wraith D, Arellano-Valle RB (2018b) A flexible class of parametric distributions for Bayesian linear mixed models. Test. https://doi.org/10.1007/s11749-018-0590-6

    Article  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Meng XL, Van Dyk DA (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86:301–320

    Article  MathSciNet  MATH  Google Scholar 

  • Mengersen K, Robert C, Titterington DM (2011) Mixtures: estimation and applications. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Murray PM, Dunson DB, Carin L, Lucas JE (2013) Bayesian Gaussian copula factor models for mixed data. J Am Stat Assoc 108(502):656–665

    Article  MathSciNet  MATH  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  MATH  Google Scholar 

  • NIMBLE Development Team (2017) NIMBLE: an R package for programming with BUGS models, Version 0.6-10. http://r-nimble.org. Accessed 19 Feb 2018

  • Paisley J, Carin L (2009) Nonparametric factor analysis with beta process priors. In: Proceedings of the 26th annual international conference on machine learning, pp 777–784

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed 19 Feb 2018

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31(2):129–150

    Article  MathSciNet  MATH  Google Scholar 

  • Song XY, Pan JH, Kwok T, Vandenput L, Ohlsson C, Leung PC (2010) A semiparametric Bayesian approach for structural equation models. Biom J 52(3):314–332

    Article  MathSciNet  MATH  Google Scholar 

  • Stan Development Team (2017) The stan core library, version 2.17.0. http://mc-stan.org. Accessed 19 Feb 2018

  • Suarez AJ, Ghosal S (2016) Bayesian estimation of principal components for functional data. Bayesian Anal 12:1–23

    MathSciNet  Google Scholar 

  • Ustugi A, Kumagai T (2001) Bayesian analysis of mixtures of factor analyzers. Neural Comput 13(5):993–1002

    Article  MATH  Google Scholar 

  • Van Dyk DA (2010) Marginal Markov chain Monte Carlo methods. Stat Sin 20:1423–1454

    MathSciNet  MATH  Google Scholar 

  • Van Dyk DA, Meng XL (2001) The art of data augmentation. J Comput Graph Stat 10:1–50

    Article  MathSciNet  Google Scholar 

  • Wall MM, Guo J, Amemiya Y (2012) Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables. Multivar Behav Res 47:276–313

    Article  Google Scholar 

  • Yang M, Dunson DB (2010) Bayesian semiparametric structural equation models with latent variables. Psychometrika 75(4):675–693

    Article  MathSciNet  MATH  Google Scholar 

  • Yu Y, Meng XL (2011) To center or not to center: that is not the question an ancillarity sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J Comput Graph Stat 20:531–570

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the associated editor and anonymous reviewers for their suggestions, corrections and encouragement, which helped us to improve earlier versions of the manuscript. We also would like to acknowledge helpful discussions with Geoff McLachlan and Sharon Lee (UQ) in the preparation of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darren Wraith.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maleki, M., Wraith, D. Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework. Comput Stat 34, 1039–1053 (2019). https://doi.org/10.1007/s00180-019-00870-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00870-6

Keywords

Navigation