Skip to main content
Log in

Model-based clustering with determinant-and-shape constraint

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Model-based approaches to cluster analysis and mixture modeling often involve maximizing classification and mixture likelihoods. Without appropriate constrains on the scatter matrices of the components, these maximizations result in ill-posed problems. Moreover, without constrains, non-interesting or “spurious” clusters are often detected by the EM and CEM algorithms traditionally used for the maximization of the likelihood criteria. Considering an upper bound on the maximal ratio between the determinants of the scatter matrices seems to be a sensible way to overcome these problems by affine equivariant constraints. Unfortunately, problems still arise without also controlling the elements of the “shape” matrices. A new methodology is proposed that allows both control of the scatter matrices determinants and also the shape matrices elements. Some theoretical justification is given. A fast algorithm is proposed for this doubly constrained maximization. The methodology is also extended to robust model-based clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Andrews, J., Wickins, J., Boers, N., McNicholas, P.: teigen: an R package for model-based clustering and classification via the multivariate \(t\) distribution. J. Stat. Softw. 83, 1–32 (2018)

    Google Scholar 

  • Bagnato, L., Punzo, A., Zoia, M.G.: The multivariate leptokurtic-normal distribution and its application in model-based clustering. Can. J. Stat. 45, 95–119 (2017)

    MathSciNet  MATH  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    MathSciNet  MATH  Google Scholar 

  • Baudry, J.P., Celeux, G.: EM for mixtures—initialization requires special care. Stat. Comput. 25, 713–726 (2015)

    MathSciNet  MATH  Google Scholar 

  • Biernacki, C., Chretien, S.: Degeneracy in the maximum likelihood estimation of univariate. Stat. Probab. Lett. 61, 373–382 (2003)

    MATH  Google Scholar 

  • Biernacki, C., Lourme, A.: Stable and visualizable Gaussian parsimonious clustering models. Stat. Comput. 24, 953–969 (2014)

    MathSciNet  MATH  Google Scholar 

  • Browne, R., Subedi, S., McNicholas, P.: Constrained optimization for a subset of the Gaussian parsimonious clustering models (2013). preprint available at arXiv:1306.5824

  • Celeux, G., Govaert, A.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data. 14, 315–332 (1992)

    MathSciNet  MATH  Google Scholar 

  • Cerioli, A., García-Escudero, L., Mayo-Iscar, A., Riani, M.: Finding the number of normal groups in model-based clustering via constrained likelihoods. J. Comput. Graph Stat. 27, 404–416 (2018)

    MathSciNet  Google Scholar 

  • Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648–1659 (2016)

    MathSciNet  Google Scholar 

  • Dang, U., Browne, R., McNicholas, P.D.: Mixtures of multivariate power exponential distributions. Biometrics 71, 1081–1089 (2015)

    MathSciNet  MATH  Google Scholar 

  • Day, N.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)

    MathSciNet  MATH  Google Scholar 

  • Dotto, F., Farcomeni, A., García-Escudero, L., Mayo-Iscar, A.: A reweighting approach to robust clustering. Stat. Comput. 28, 477–493 (2018)

    MathSciNet  MATH  Google Scholar 

  • Flury, B., Riedwyl, H.: Multivariate Statistics, A Practical Approach. Cambridge University Press, Cambridge (1988)

    MATH  Google Scholar 

  • Friedman, H., Rubin, J.: On some invariant criteria for grouping data. J. Am. Stat. Assoc. 63, 1159–1178 (1967)

    MathSciNet  Google Scholar 

  • Fritz, H., García-Escudero, L., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)

    MathSciNet  MATH  Google Scholar 

  • Gallegos, M., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)

    MathSciNet  MATH  Google Scholar 

  • Gallegos, M., Ritter, G.: Trimming algorithms for clustering contaminated grouped data and their robustness. Adv. Data Anal. Classif. 10, 135–167 (2009)

    MathSciNet  MATH  Google Scholar 

  • Gallegos, M.T.: Maximum likelihood clustering with outliers. In: Jajuga, K., Sokolowski, A., Bock, H. (eds.) Classification, Clustering and Data Analysis: Recent Advances and Applications, pp. 247–255. Springer, Berlin (2002)

    Google Scholar 

  • García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 8, 27–43 (2014a)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv. Data Anal. Classif. 8, 27–43 (2014b)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25, 619–633 (2015)

    MathSciNet  MATH  Google Scholar 

  • García-Escudero, L., Gordaliza, A., Greselin, F., Ingrassia, S., Mayo-Iscar, A.: Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv. Data Anal. Classif. 12, 203–233 (2018)

    MathSciNet  MATH  Google Scholar 

  • Hathaway, R.: A constrained formulation of maximum likelihood estimation for normal mixture distributions. Ann. Stat. 13, 795–800 (1985)

    MathSciNet  MATH  Google Scholar 

  • Hennig, C., Liao, T.F.: How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J. R. Stat. Soc. Ser. C 62, 309–369 (2013)

    MathSciNet  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    MATH  Google Scholar 

  • Ingrassia, S., Rocci, R.: Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput. Stat. Data Anal. 51, 5339–5351 (2007)

    MathSciNet  MATH  Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27, 887–906 (1956)

    MathSciNet  MATH  Google Scholar 

  • Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph Stat. 19, 354–376 (2010)

    MathSciNet  Google Scholar 

  • Maronna, R., Jacovkis, P.: Multivariate clustering procedures with variable metrics. Biometrics 30, 499–505 (1974)

    MATH  Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, New York (2000)

    MATH  Google Scholar 

  • Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)

    MathSciNet  MATH  Google Scholar 

  • Peel, D., McLachlan, G.J.: Robust mixture modelling using the \(t\) distribution. Stat. Comput. 10, 339–348 (2000)

    Google Scholar 

  • Punzo, A., McNicholas, P.D.: Parsimonious mixtures of multivariate contaminated normal distributions. Biomet. J. 58, 1506–1537 (2016)

    MathSciNet  MATH  Google Scholar 

  • Punzo, A., Mazza, A., McNicholas, P.D.: Contaminatedmixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J. Stat. Softw. 85, 1–25 (2018)

    Google Scholar 

  • Riani, M., Perrotta, D., Torti, F.: FSDA: a Matlab toolbox for robust analysis and interactive data exploration. Chemom. Intell. Lab. Syst. 116, 17–32 (2012)

    Google Scholar 

  • Riani, M., Cerioli, A., Perrotta, D., Torti, F.: Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library. Adv. Data Anal. Classif. 9, 461–481 (2015)

    MathSciNet  MATH  Google Scholar 

  • Riani, M., Atkinson, A., Cerioli, A., Corbellini, A.: Efficient robust methods via monitoring for clustering and multivariate data analysis. Pattern Recognit. 88, 246–260 (2019)

    Google Scholar 

  • Ritter, G.: Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  • Rocci, R., Gattone, S., Di Mari, R.: A data driven equivariant approach to constrained Gaussian mixture modeling. Adv. Data Anal. Classif. 12, 235–260 (2018)

    MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Google Scholar 

  • Seo, B., Kim, D.: Root selection in normal mixture models. Comput. Stat. Data Anal. 56, 2454–2470 (2012)

    MathSciNet  MATH  Google Scholar 

  • Zhang, J., Liang, F.: Robust clustering using exponential power mixtures. Biometrics 66, 1078–1086 (2010)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Angel García-Escudero.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is partially supported by Spanish Ministerio de Economía y Competitividad, Grant MTM2017-86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, Grant VA005P17 and VA002G18. This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy. M.R. gratefully acknowledges support from the CRoNoS project, reference CRoNoS COST Action IC1408 and the University of Parma project “Statistics for fraud detection, with 237 applications to trade data and financial statement”. The authors also thank the editor, the associate editor, and the anonymous referees for their constructive comments.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García-Escudero, L.A., Mayo-Iscar, A. & Riani, M. Model-based clustering with determinant-and-shape constraint. Stat Comput 30, 1363–1380 (2020). https://doi.org/10.1007/s11222-020-09950-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-020-09950-w

Keywords

Navigation