Abstract
We present a parsimonious dual-subspace clustering approach for a mixture of matrix-normal distributions. By assuming certain principal components of the row and column covariance matrices are equally important, we express the model in fewer parameters without sacrificing discriminatory information. We derive update rules for an ECM algorithm and set forth necessary conditions to ensure identifiability. We use simulation to demonstrate parameter recovery, and we illustrate the parsimony and competitive performance of the model through two data analyses.
Similar content being viewed by others
References
Aitkin M, Rubin DB (1985) Estimation and hypothesis testing in finite mixture models. J R Stat Soc Ser B (Methodol) 47(1):67–75
Banfield J, Raftery A (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49:803–821
Basford KE, McLachlan GJ (1985) The mixture method of clustering applied to three-way data. J Classifi 12:558. https://doi.org/10.1007/BF01908066
Bellman R (1954) The theory of dynamic programming. Bull Am Math Soc 60(6):503–515
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput Stat Data Anal 41(3):561–575
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519
Bouveyron C, Celeux G, Murphy TB, Raftery AE (2019) Model-based clustering and classification for data science: with applications in R. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. ISBN 9781108494205. https://books.google.ca/books?id=ldGoDwAAQBAJ
Browne RP, Mcnicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classifi 8(2):217–226
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793
Dawid AP (1981) Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68(1):265–274. https://doi.org/10.1093/biomet/68.1.265
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Dogru FZ, Bulut YM, Arslan O (2016) Finite mixtures of matrix variate t distributions. Gazi Univ J Sci 29:335–341
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, London
Fraley C, Raftery AE (1998) How many clusters? which clustering method? answers via model-based cluster analysis. The Comput J 41(8):578–588. https://doi.org/10.1093/comjnl/41.8.578
Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Fraley C, Raftery A (2003) Enhanced model-based clustering, density estimation, and discriminant analysis software: Mclust. J Classif 20:263–286
Fraley C, Raftery A (2003) Enhanced model-based clustering, density estimation, and discriminant analysis software: Mclust. J Class 20:263–286
Gallaugher Michael PB, McNicholas P (2018) Finite mixtures of skewed matrix variate distributions. Pattern Recognit 80:83–93
Gallaugher M, McNicholas P (2019) Mixtures of skewed matrix variate bilinear factor analyzers. Adv Data Anal Class 14:11. https://doi.org/10.1007/s11634-019-00377-4
Ghahramani Z, Hinton GE (1996) The em algorithm for mixtures of factor analyzers
Glanz H, Carvalho L (2013) An expectation-maximization algorithm for the matrix normal distribution. J Multivariate Anal 167:09. https://doi.org/10.1016/j.jmva.2018.03.010
Hubert L, Arabie P (1985) Comparing partitions. J Classifi 2:193–218
Keribin C (2000) Consistent estimation of the order of mixture models. The Indian J Stat Ser A 62(1):49–66
McLachlan G, Peel D (2000) Finite mixture models. Wiley, London
McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3):379–388
McNicholas PD (2016) Model-based clustering. J Classifi 33:331–373. https://doi.org/10.1007/s00357-016-9211-9
McNicholas P, Murphy T (2008) Parsimonious gaussian mixture models. Stat Comput 18:285–296. https://doi.org/10.1007/s11222-008-9056-0
Melnykov V, Zhu X (2018) On model-based clustering of skewed matrix data. J Multivariate Anal 167:04. https://doi.org/10.1016/j.jmva.2018.04.007
Melnykov V, Zhu X (2018) Studying crime trends in the USA over the years 2000–2012. Adv Data Anal Class 13:06. https://doi.org/10.1007/s11634-018-0326-1
Meng X-L, Rubin DB (1993) Maximum likelihood estimation via the ecm algorithm: a general framework. Biometrika 80(2):267–278
R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Roeder K, Wasserman L (1997) Practical bayesian density estimation using mixtures of normals. J Am Stat Assoc 92(439):894–902
Sarkar S, Zhu X, Melnykov V, Ingrassia S (2019) On parsimonious models for modeling matrix data. Comput Stat Data Anal 142:106822. https://doi.org/10.1016/j.csda.2019.106822
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Scott D, Thompson J (1983) Probability density estimation in higher dimension. In: Computer science and statistics: proceedings of the fifteenth symposium on the interface, pp 01
Srivastava M, von Rosen T, von Rosen D (2008) Models with a kronecker product covariance structure: estimation and testing. Math Methods Stat 17:357–370. https://doi.org/10.3103/S1066530708040066
Tomarchio S, Punzo A, Bagnato L (2020) Two new matrix-variate distributions with application in model-based clustering. Comput Stat Data Anal 152:107050. https://doi.org/10.1016/j.csda.2020.107050
Tomarchio S, McNicholas P, Punzo A (2021) Matrix normal cluster-weighted models. J Classifi. https://doi.org/10.1007/s00357-021-09389-2
Viroli C (2011) Finite mixtures of matrix normal distributions for classifying three-way data. Stat Comput 21:511–522. https://doi.org/10.1007/s11222-010-9188-x
Viroli C (2011) Model based clustering for three-way data structures. Bayesian Anal 6(4):573–602. https://doi.org/10.1214/11-BA622
Wolfe JH (1964) A computer program for the maximum likelihood analysis of types. In: Technical Bulletin 65-15, U.S Naval Personnel Research Activity
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms
Funding
Funding was provided by Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (Grant No. 04444).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A parameters used in model selection simulation
A parameters used in model selection simulation
The parameters used to generate observations are as follows. The mean parameters are,
The covariance parameters are the same across groups and across dimensions and are specifed as,
where \(i = 1,2\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharp, A., Chalatov, G. & Browne, R.P. A dual subspace parsimonious mixture of matrix normal distributions. Adv Data Anal Classif 17, 801–822 (2023). https://doi.org/10.1007/s11634-022-00526-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-022-00526-2