Abstract
A family of parsimonious contaminated shifted asymmetric Laplace mixtures is developed for unsupervised classification of asymmetric clusters in the presence of outliers and noise. A series of constraints are applied to a modified factor analyzer structure of the component scale matrices, yielding a family of twelve models. Application of the modified factor analyzer structure and these parsimonious constraints makes these models effective for the analysis of high-dimensional data by reducing the number of free parameters that need to be estimated. A variant of the expectation-maximization algorithm is developed for parameter estimation with convergence issues being discussed and addressed. Popular model selection criteria like the Bayesian information criterion and the integrated complete likelihood (ICL) are utilized, and a novel modification to the ICL is also considered. Through a series of simulation studies and real data analyses, that includes comparisons to well-established methods, we demonstrate the improvements in classification performance found using the proposed family of models.
Similar content being viewed by others
Data Availability
All data analyzed in this article are fully presented in the article.
References
Aitken, A. (1926). On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edimburgh, 46, 289–305.
Andrews, J. L., & McNicholas, P. D. (2011). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Andrews, J. L., & McNicholas, P. D. (2011). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Andrews, J. L., & McNicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.
Baek, J., McLachlan, G. J., & Flack, L. K. (2009). Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1298–1309.
Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503–515.
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analysis and Machine Intelligence 22(7), 719–725.
Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal, 41(3–4), 561–575.
Böhning, D., Diez, E., Scheub, R., Schlattmann, P., & Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.
Browne, R. P., & McNicholas, P. D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.
Fang, Y., Franczak, B.C., & Subedi, S. (2023). Tackling the infinite likelihood problem when fitting mixtures of shifted asymmetric Laplace distributions
Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils origin from their fatter acid composition. Food Research and Data Analysis (pp. 189–214). London: Applied Science Publishers.
Forina, M., & Tiscornia, E. (1982). Pattern recognition methods in the prediction of Italian olive oils origin by their fatter acid content. Annali di Chimica, 72, 143–155.
Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering methods? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Franczak, B., Browne, R. P., McNicholas, P., & Burak, K. (2018). MixSAL: Mixtures of multivariate shifted asymmetric Laplace (SAL) distributions. R package version, 1.0
Franczak, B., Browne, R. P., & McNicholas, P. D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Toronto, ON
Hennig, C. (2010). Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification, 4(1), 3–34.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Jørgensen, B. (1982). Statistical properties of the generalized inverse Gaussian distribution. New York: Springer-Verlag.
Kotz, S., Kozubowski, T. J. & Podgorski, K. (2001). The Laplace distribution and generalizations: A revisit with applications to communications, economics, engineering, and finance (1st ed.). Burkhauser Boston.
Lin, T.-I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100, 257–265.
Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
Maugis, C., Celeux, G., & Martin-Magniette, M.-L. (2009). Variable selection for clustering with Gaussian mixture models. Biometrics, 65(3), 701–9.
McLachlan, G. J., Bean, R. W., & Jones, L.B.-T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd ed.). New York: Wiley.
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: John Wiley & Sons.
McLachlan, G. J., & Peel, D. (2000b). Mixtures of factor analyzers. In: Proceedings of the seventh international conference on machine learning, San Francisco, pp. 599–606. Morgan Kaufmann.
McNicholas, P. D. (2016). Mixture model-based classification. Boca Raton FL: Chapman & Hall/CRC Press.
McNicholas, P. D. (2016). Model-based clustering. Journal of Classification, 33, 331–373.
McNicholas, P. D., ElSherbiny, A., McDaid, A. F., & Murphy, T. B. (2022). pgmm: Parsimonious Gaussian mixture models. R package version, 1(2), 6.
McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.
McNicholas, P. D., & Murphy, T. B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Journal of Statistical Planning and Inference, 26(21), 2705–2712.
McNicholas, P. D., Murphy, T. B., McDaid, A. F., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis, 54(3), 711–723.
McNicholas, S., McNicholas, P. D., & Browne, R. P. (2017). A mixture of variance-gamma factor analyzers, pp. 369–385. Cham: Springer International Publishing.
McNicholas, S. M., McNicholas, P. D., & Ashlock, D. A. (2021). An evolutionary algorithm with crossover and mutation for model-based clustering. Journal of Classification, 38, 264–279.
Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278.
Meng, X. L., & Van Dyk, D. (1997). The EM algorithm - An old folk song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3), 511–567.
Morris, K., Punzo, A., Blostein, M., & McNicholas, P. D. (2019). Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.
Murray, P. M., Browne, R. B., & McNicholas, P. D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.
Punzo, A., Blostein, M., & McNicholas, P. D. (2020). High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition, 98(107031), 1–12.
Punzo, A., Mazza, A., & McNicholas, P. D. (2018). ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. Journal of Statistical Software, 85(10), 1–25.
Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506–1537.
Qui, W., & Joe, H. (2020). clusterGeneration: Random cluster generation (with specified degree of separation). R package version, 1(3), 7.
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Schork, N. J., & Schork, M. A. (1988). Skewness and mixtures of normal distributions. Journal of the American Statistical Association, 17, 3951–3969.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Sclove, S. L. (2002). Assessing accuracy and precision of a medical lab machine by means of cluster analysis. Journal of classification, 19(2), 197–214.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 88–103.
Steane, M. A., McNicholas, P. D., & Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics - Simulation and Computation, 41(4), 510–523.
Steinley, D. (2004). Properties of the Hubert-Arable adjusted Rand index. Psychological methods, 9(3), 386.
Telford, R., & Cunningham, R. (1991). Sex, sport and body-size dependency of hematology in highly trained athletes. Medicine and Science in Sports and Exercise, 23, 788–794.
Tipping, T., & Bishop, C. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Tipping, T., & Bishop, C. (1999). Probabilistic principal component analysers. Journal of the Royal Statistical Society, Series B, 61, 611–622.
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester: John Wiley & Sons.
Tong, H., & Tortora, C. (2022). Model-based clustering and outlier detection with missing data. Advances in Data Analysis and Classification, 16(1), 5–30.
Tortora, C., McNicholas, P. D., & Browne, R. P. (2016). A mixture of generalized hyperbolic factor analyzers. Advanced in Data Analysis and Classification, 10(4), 423–440.
Tukey, J. (1960). A survey of sampling from contaminated distributions. In: Oklin, I., Ed., Contributions to probability and statistics, Redwood, CA., pp. 448–485. Stanford University Press
Wehrens, R., Buydens, L. M., Fraley, C., & Raftery, A. E. (2004). Model-based clustering for image segmentation and large datasets via sampling. Journal of Classification, 21(2), 231–253.
Wei, Y., Tang, Y., & McNicholas, P. D. (2018). Flexible high-dimensional unsupervised learning with missing data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(3), 610–621.
Woodbury, M. (1950). Inverting modified matrices. Technical Report 42, Princeton University, Princeton, N.J
Acknowledgements
We would like to thank the editor and three anonymous referees for their constructive feedback that, in our opinion, helped us improve the paper.
Funding
This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
The research study did not involve any human participants or animals.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Updates Used in the AECM Presented in Section 4
Appendix. Updates Used in the AECM Presented in Section 4
1.1 A.1 E-Step in the First Alternation
In the E-step of the first alternation of iteration \((k+1)\) of the proposed AECM, the sources of missing data required to compute Eq. 31 are replaced with the expected values \(z^{(k+1)}_{ig}\), \(v^{(k+1)}_{ig}\), \(E^{(k+1)}_{1ig}\), \(E^{(k+1)}_{2ig}\), \(\widetilde{E}^{(k+1)}_{1ig}, \widetilde{E}^{(k+1)}_{2ig}\), respectively, for \(g=1,\ldots ,G\). Formally, these expected values can be written as follows:
where \(a^{(k)}_g=2+\varvec{\alpha }^{(k)'}_g(\varvec{\Sigma }^{(k)}_g)^{-1}\varvec{\alpha }^{(k)}_g,\ b^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \varvec{\Sigma }^{(k)}_g),\ \widetilde{b}^{(k)}_{ig}=\delta (\textbf{x}_i,\varvec{\mu }^{(k)}_g\mid \eta ^{(k)}_g\varvec{\Sigma }^{(k)}_g)\), \(\upsilon = (2-p)/2\), \(\varvec{\Sigma }_g^{(k)} = \varvec{\Lambda }_g^{(k)}\varvec{\Lambda }_g^{(k)'} + \omega _g^{(k)}\varvec{\Delta }_g^{(k)}\), and all other terms are previously defined for Eqs. 6 – 9 in Section 2.1. The closed-form updates for \(E_{1ig},\ E_{2ig},\ \widetilde{E}_{1ig},\) \(\widetilde{E}_{2ig}\) follow from the conditional relationship between a standard exponential random variable and a SAL random vector given in Eq. 14.
1.2 A.2 CM-Step 1 in the First Alternation
In the first CM-step of the first alternation of iteration \((k+1)\) of the proposed AECM, the maximum likelihood estimates (MLEs) for the parameters in \(\varvec{\vartheta }_{11}=\{\pi _g,\rho _g,\varvec{\mu }_g,\varvec{\alpha }_g\}_{g=1}^G\) are given by
where
and \(n_g^{(k)} = \sum _{g=1}^G{z_{ig}^{(k)}}\) and \(n_{g,\text {good}}^{(k)} \sum _{g=1}^G{z_{ig}^{(k)}v_{ig}^{(k)}}\) were defined for Eq. 31.
1.3 A.3 E-Step in the Second Alternation
For the single-factor analysis model defined in Eq. 18,
we have \(\textbf{X}\sim \text {N}_p\left( \varvec{\mu },\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi }\right) \) and \(\textbf{u}\sim \text {N}_q\left( \textbf{0},\textbf{I}_q\right) \), where all terms are as defined for Eq. 18. So, one can show that
It follows that the conditional expectations for the latent factor \(\textbf{U}\) and the outer product \(\textbf{U}\textbf{U}'\) are given by
where \(\varvec{\beta }=\varvec{\Lambda }'(\varvec{\Lambda }\varvec{\Lambda }'+\varvec{\Psi })^{-1}\). Using Eq. 47 and 48, we can write the expected values required to compute the expected value of Eq. 35 in the E-step of the second alternation. Formally, these expected values can be written as follows:
where \(z_{ig}^{(k+1/2)}\), \(v^{(k+1/2)}_{ig}\), \(E^{(k+1/2)}_{1ig}\), \(\widetilde{E}^{(k+1/2)}_{1ig}\), \(E^{(k+1/2)}_{2ig}\), and \(\widetilde{E}^{(k+1/2)}_{2ig}\) are computed using the updates given in Appendix A.1, but with the required model parameters replaced with their estimates from iteration \((k+1)\), i.e., from the first alternation of the proposed AECM detailed in Section 4.
1.4 A.4 Illustrative Example of the CM-Step in the Second Alternation
To fit the CCUU model, the CM-Step of the second alternation of the proposed AECM is as follows. First, recall that the CCUU model scale matrix has the form
where all terms are as defined for Eqs. 18 and 29 for \(g=1,\ldots ,G\). Let
with \(\varvec{\beta }^{(k)}\) defined according to the imposed constraints. Maximizing Eq. 37 gives the following updating equations:
where
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
McLaughlin, P., Franczak, B.C. & Kashlak, A.B. Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures. J Classif 41, 65–93 (2024). https://doi.org/10.1007/s00357-023-09460-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-023-09460-0