Robust clustering via mixtures of t factor analyzers with incomplete data

Wang, Wan-Lun; Lin, Tsung-I

doi:10.1007/s11634-021-00453-8

Robust clustering via mixtures of t factor analyzers with incomplete data

Regular Article
Published: 05 July 2021

Volume 16, pages 659–690, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

363 Accesses
6 Citations
Explore all metrics

Abstract

Mixtures of t factor analyzers (MtFA) are powerful and widely used tools for robust clustering of high-dimensional data in the presence of outliers. However, the occurrence of missing values may cause analytical intractability and computational complexity when fitting the MtFA model. We explicitly derive the score vector and Hessian matrix of the MtFA model with incomplete data to approximate the information matrix. In this regard, some asymptotic properties can be established under certain regularity conditions. Three expectation-maximization-based algorithms are developed for maximum likelihood estimation of the MtFA model with possibly missing values at random. Practical issues related to the recovery of missing values and clustering of partially observed samples are also investigated. The relevant utility of our methodology is exemplified through the analysis of simulated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust clustering of multiply censored data via mixtures of t factor analyzers

Article 08 April 2021

Robust model-based clustering via mixtures of skew-t distributions with missing information

Article 17 November 2015

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

Article 06 December 2022

References

Anderson TW (1957) Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J Am Stat Assoc 52:200–203
Article MathSciNet MATH Google Scholar
Boldea O, Magnus JR (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
MATH Google Scholar
Fokoué E, Titterington DM (2003) Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation. Mach Learn 50:73–94
Article MATH Google Scholar
Ghahramani Z, Beal MJ (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, pp 449–455
Google Scholar
Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers, Technical report no. CRG-TR-96-1, University of Toronto, Canada
Greselin F, Ingrassia S (2015) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25:215–226
Article MathSciNet MATH Google Scholar
Hirose K, Kim S, Kano Y, Imada M, Yoshida M, Matsuo M (2016) Full information maximum likelihood estimation in factor analysis with a large number of missing values. J Stat Comput Simul 86:91–104
Article MathSciNet MATH Google Scholar
Hocking RR, Smith WB (1968) Estimation of parameters in the multivariate normal distribution with missing observations. J Am Stat Assoc 63:159–173
MathSciNet Google Scholar
Kotz S, Nadarajah S (2004) Multivariate \(t\) distributions and their applications. Cambridge University Press, Cambridge
MATH Google Scholar
Lee SX, Lin TI, McLachlan GJ (2021) Mixtures of factor analyzers with fundamental skew symmetric distributions. Adv Data Anal Classif 15:481–512
Article MathSciNet MATH Google Scholar
Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittent missing responses. Stat Med 37:2822–2835
Article MathSciNet Google Scholar
Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recognit 39:1177–1187
Article MATH Google Scholar
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Article MathSciNet MATH Google Scholar
Lin TI, McNicholas PD, Ho HJ (2014) Capturing patterns via parsimonious \(t\) mixture models. Stat Prob Lett 88:80–87
MathSciNet MATH Google Scholar
Lin TI, Wang WL (2020) Multivariate-\(t\) linear mixed models with censored responses, intermittent missing values and heavy tails. Stat Meth Med Res 29:1288–1304
MathSciNet Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Liu C (1999) Efficient ML estimation of the multivariate normal distribution from incomplete data. J Multivar Anal 69:206–217
Article MathSciNet MATH Google Scholar
Maleki M, Wraith D (2019) Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework. Comput Stat 34:1039–1053
Article MathSciNet MATH Google Scholar
Maleki M, Wraith D, Arellano-Valle RB (2019) A flexible class of parametric distributions for Bayesian linear mixed models. TEST 28:543–564
Article MathSciNet MATH Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422
Article Google Scholar
McLachlan GJ, Bean RW, Jones LBT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338
MathSciNet MATH Google Scholar
McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388
Article MathSciNet MATH Google Scholar
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296
Article MathSciNet Google Scholar
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Article MathSciNet MATH Google Scholar
Meng XL, van Dyk D (1997) The EM algorithm: an old folk-song sung to a fast new tune. J R Stat Soc Ser B 59:511–567
Article MathSciNet MATH Google Scholar
Montanari A, Viroli C (2011) Maximum likelihood estimation of mixtures of factor analyzers. Comput Stat Data Anal 55:2712–2723
Article MathSciNet MATH Google Scholar
Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239
Article MathSciNet MATH Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Article MathSciNet MATH Google Scholar
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London
Book MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128
Article Google Scholar
Utsugi A, Kumagai T (2001) Bayesian analysis of mixtures of factor analyzers. Neural Comput 13:993–1002
Article MATH Google Scholar
Woodbury MA (1950) Inverting Modified Matrices. Statistical Research Group, Memo Rep No. 42. Princeton University, Princeton, New Jersey
Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factor analyzers. Comput Stat Data Anal 140:104–121
Article MathSciNet MATH Google Scholar
Wang WL, Castro LM, Lin TI (2017) Automated learning of \(t\) factor analysis models with complete and incomplete data. J Multivar Anal 161:157–171
MathSciNet MATH Google Scholar
Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput Stat 28:751–769
MathSciNet MATH Google Scholar
Wang WL, Lin TI (2016) Maximum likelihood inference for the multivariate \(t\) mixture model. J Multivar Anal 149:54–64
MathSciNet MATH Google Scholar
Wang WL, Lin TI (2020) Automated learning of mixtures of factor analysis models with missing information. TEST 29:1098–1124
Article MathSciNet MATH Google Scholar
Wang WL, Lin TI (2021) Robust clustering of multiply censored data via mixtures of \(t\) factor analyzers. TEST. https://doi.org/10.1007/s11749-021-00766-y
MATH Google Scholar
Zhao JH, Shi L (2014) Automated learning of factor analysis with complete and incomplete data. Comput Stat Data Anal 72:205–218
Article MathSciNet MATH Google Scholar
Zhao JH, Yu PLH (2008) Fast ML estimation for the mixture of factor analyzers via an ECM algorithm. IEEE Trans Neural Netw 19:1956–1961
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the Coordinating Editor, Maurizio Vichi, the Associate Editor and two anonymous referees for their comments and suggestions that greatly improved this paper. We are also grateful to Mr. Meng-Chih Liu for making some initial inputs. W.L. Wang and T.I. Lin would like to acknowledge the support of the Ministry of Science and Technology of Taiwan under Grant Nos. MOST 107-2628-M-035-001-MY3 and MOST 109-2118-M-005-005-MY3.

Author information

Authors and Affiliations

Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung, 40724, Taiwan
Wan-Lun Wang
Institute of Statistics, National Chung Hsing University, Taichung, 402, Taiwan
Tsung-I Lin
Department of Public Health, China Medical University, Taichung, 404, Taiwan
Tsung-I Lin

Authors

Wan-Lun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-I Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tsung-I Lin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 74 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, WL., Lin, TI. Robust clustering via mixtures of t factor analyzers with incomplete data. Adv Data Anal Classif 16, 659–690 (2022). https://doi.org/10.1007/s11634-021-00453-8

Download citation

Received: 04 April 2020
Revised: 05 May 2021
Accepted: 14 June 2021
Published: 05 July 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11634-021-00453-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust clustering via mixtures of t factor analyzers with incomplete data

Abstract

Access this article

Similar content being viewed by others

Robust clustering of multiply censored data via mixtures of t factor analyzers

Robust model-based clustering via mixtures of skew-t distributions with missing information

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 74 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Robust clustering via mixtures of t factor analyzers with incomplete data

Abstract

Access this article

Similar content being viewed by others

Robust clustering of multiply censored data via mixtures of t factor analyzers

Robust model-based clustering via mixtures of skew-t distributions with missing information

Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 74 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation