Skip to main content
Log in

Robust clustering via mixtures of t factor analyzers with incomplete data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Mixtures of t factor analyzers (MtFA) are powerful and widely used tools for robust clustering of high-dimensional data in the presence of outliers. However, the occurrence of missing values may cause analytical intractability and computational complexity when fitting the MtFA model. We explicitly derive the score vector and Hessian matrix of the MtFA model with incomplete data to approximate the information matrix. In this regard, some asymptotic properties can be established under certain regularity conditions. Three expectation-maximization-based algorithms are developed for maximum likelihood estimation of the MtFA model with possibly missing values at random. Practical issues related to the recovery of missing values and clustering of partially observed samples are also investigated. The relevant utility of our methodology is exemplified through the analysis of simulated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Anderson TW (1957) Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J Am Stat Assoc 52:200–203

    Article  MathSciNet  MATH  Google Scholar 

  • Boldea O, Magnus JR (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38

    MATH  Google Scholar 

  • Fokoué E, Titterington DM (2003) Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation. Mach Learn 50:73–94

    Article  MATH  Google Scholar 

  • Ghahramani Z, Beal MJ (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, pp 449–455

    Google Scholar 

  • Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers, Technical report no. CRG-TR-96-1, University of Toronto, Canada

  • Greselin F, Ingrassia S (2015) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25:215–226

    Article  MathSciNet  MATH  Google Scholar 

  • Hirose K, Kim S, Kano Y, Imada M, Yoshida M, Matsuo M (2016) Full information maximum likelihood estimation in factor analysis with a large number of missing values. J Stat Comput Simul 86:91–104

    Article  MathSciNet  MATH  Google Scholar 

  • Hocking RR, Smith WB (1968) Estimation of parameters in the multivariate normal distribution with missing observations. J Am Stat Assoc 63:159–173

    MathSciNet  Google Scholar 

  • Kotz S, Nadarajah S (2004) Multivariate \(t\) distributions and their applications. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Lee SX, Lin TI, McLachlan GJ (2021) Mixtures of factor analyzers with fundamental skew symmetric distributions. Adv Data Anal Classif 15:481–512

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 37:2822–2835

    Article  MathSciNet  Google Scholar 

  • Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recognit 39:1177–1187

    Article  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, McNicholas PD, Ho HJ (2014) Capturing patterns via parsimonious \(t\) mixture models. Stat Prob Lett 88:80–87

    MathSciNet  MATH  Google Scholar 

  • Lin TI, Wang WL (2020) Multivariate-\(t\) linear mixed models with censored responses, intermittent missing values and heavy tails. Stat Meth Med Res 29:1288–1304

    MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Liu C (1999) Efficient ML estimation of the multivariate normal distribution from incomplete data. J Multivar Anal 69:206–217

    Article  MathSciNet  MATH  Google Scholar 

  • Maleki M, Wraith D (2019) Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework. Comput Stat 34:1039–1053

    Article  MathSciNet  MATH  Google Scholar 

  • Maleki M, Wraith D, Arellano-Valle RB (2019) A flexible class of parametric distributions for Bayesian linear mixed models. TEST 28:543–564

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422

    Article  Google Scholar 

  • McLachlan GJ, Bean RW, Jones LBT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296

    Article  MathSciNet  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    Article  MathSciNet  MATH  Google Scholar 

  • Meng XL, van Dyk D (1997) The EM algorithm: an old folk-song sung to a fast new tune. J R Stat Soc Ser B 59:511–567

    Article  MathSciNet  MATH  Google Scholar 

  • Montanari A, Viroli C (2011) Maximum likelihood estimation of mixtures of factor analyzers. Comput Stat Data Anal 55:2712–2723

    Article  MathSciNet  MATH  Google Scholar 

  • Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  MathSciNet  MATH  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128

    Article  Google Scholar 

  • Utsugi A, Kumagai T (2001) Bayesian analysis of mixtures of factor analyzers. Neural Comput 13:993–1002

    Article  MATH  Google Scholar 

  • Woodbury MA (1950) Inverting Modified Matrices. Statistical Research Group, Memo Rep No. 42. Princeton University, Princeton, New Jersey

  • Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factor analyzers. Comput Stat Data Anal 140:104–121

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL, Castro LM, Lin TI (2017) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171

    MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput Stat 28:751–769

    MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2016) Maximum likelihood inference for the multivariate \(t\) mixture model. J Multivar Anal 149:54–64

    MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2020) Automated learning of mixtures of factor analysis models with missing information. TEST 29:1098–1124

    Article  MathSciNet  MATH  Google Scholar 

  • Wang WL, Lin TI (2021) Robust clustering of multiply censored data via mixtures of \(t\) factor analyzers. TEST. https://doi.org/10.1007/s11749-021-00766-y

    MATH  Google Scholar 

  • Zhao JH, Shi L (2014) Automated learning of factor analysis with complete and incomplete data. Comput Stat Data Anal 72:205–218

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao JH, Yu PLH (2008) Fast ML estimation for the mixture of factor analyzers via an ECM algorithm. IEEE Trans Neural Netw 19:1956–1961

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the Coordinating Editor, Maurizio Vichi, the Associate Editor and two anonymous referees for their comments and suggestions that greatly improved this paper. We are also grateful to Mr. Meng-Chih Liu for making some initial inputs. W.L. Wang and T.I. Lin would like to acknowledge the support of the Ministry of Science and Technology of Taiwan under Grant Nos. MOST 107-2628-M-035-001-MY3 and MOST 109-2118-M-005-005-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 74 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, WL., Lin, TI. Robust clustering via mixtures of t factor analyzers with incomplete data. Adv Data Anal Classif 16, 659–690 (2022). https://doi.org/10.1007/s11634-021-00453-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00453-8

Keywords

Mathematics Subject Classification

Navigation