Measuring the component overlapping in the Gaussian mixture model

Sun, Haojun; Wang, Shengrui

doi:10.1007/s10618-011-0212-3

Measuring the component overlapping in the Gaussian mixture model

Published: 27 February 2011

Volume 23, pages 479–502, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Haojun Sun¹ &
Shengrui Wang²

1498 Accesses
40 Citations
3 Altmetric
Explore all metrics

Abstract

The ability of a clustering algorithm to deal with overlapping clusters is a major indicator of its efficiency. However, the phenomenon of cluster overlapping is still not mathematically well characterized, especially in multivariate cases. In this paper, we are interested in the overlap phenomenon between Gaussian clusters, since the Gaussian mixture is a fundamental data distribution model suitable for many clustering algorithms. We introduce the novel concept of the ridge curve and establish a theory on the degree of overlap between two components. Based on this theory, we develop an algorithm for calculating the overlap rate. As an example, we use this algorithm to calculate the overlap rates between the classes in the IRIS data set and clear up some of the confusion as to the true number of classes in the data set. We investigate factors that affect the value of the overlap rate, and show how the theory can be used to generate “truthed data” as well as to measure the overlap rate between a given pair of clusters or components in a mixture. Finally, we show an example of application of the theory to evaluate the well known clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Clustering Separation Measure Based on Negentropy

Article 04 October 2014

Improving model choice in classification: an approach based on clustering of covariance matrices

Article Open access 19 March 2024

Quantum-like Gaussian mixture model

Article 11 June 2021

References

Aitnouri E, Dubeau V, Wang S, Ziou D (2002) Controlling mixture component overlap for clustering algorithms evaluation. J Pattern Recog Image Anal 12(4): 331–346
Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
MATH Google Scholar
Bouguessa M, Wang S, Sun H (2006) An objective approach to cluster validation. Pattern Recogn Lett 27(13): 1419–1430
Article Google Scholar
Chan H, Chung A, Yu A.N.S, Wells W (2003) Clustering web content for efficient replication. In: 2003 Conference on computer vision and pattern recognition (CVPR ’03), vol II
Day N (1969) Estimating the components of a mixture of two normal distributions. Biometrics 56: 463–474
Article MATH Google Scholar
Do M, Vetterliyx M (2000) Texture similarity measurement using Kullback-Leibler distance on wavelet subbands. In: 2000 international conference on image processing (ICIP00), vol 3, pp 730–733
Fraley C (1998) Algorithm for model-based Gaussian hierachical clustering. SIAM J Sci Comput 20(1): 270–281
Article MathSciNet MATH Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic-Press, New York
MATH Google Scholar
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7): 773–781
Article Google Scholar
Halgamuge S, Glesner M (1994) Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Syst 65(1): 1–12
Article Google Scholar
Hsu T-H (2000) An application of fuzzy clustering group-positioning analysis. Proc Natl Sci Counc ROC(C) 10: 157–167
Google Scholar
Kullback S (1959) Information theory and statistics. Wiley, New York
MATH Google Scholar
McLachlan G, Basford K (1988) Mixture models inference and applications to clustering. Marcel Dekker, New York
MATH Google Scholar
Milligan G (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrica 45(3): 325–342
Article Google Scholar
Nicholls K, Tudorancea C (2001) Application of fuzzy cluster analysis to Lake Simcoe crustacean zooplankton community structure. Can J Fish Aquat Sci 58(2): 231–240
Article Google Scholar
Pal N, Bezdek J (1995) On cluster validity for the fuzzy C-means Model. IEEE Trans Fuzzy Syst 3(3): 370–390
Article Google Scholar
Ramos V, Muge F (2000) Map segmentation by colour cube genetic k-mean clustering. In: ECDL 2000, vol 1923, Lisbon, Portugal, pp 319–323
Salvi G (2003) Accent clustering in Swedish using the Bhattacharyya distance. In: 15th (ICPhS) International congress of phonetic sciences, pp 1149–1152
Sun H, Wang S (2004) Distinguishing between overlapping components in mixture models. In: Proceedings of the 2nd IASTED international conference on neural networks and computational intelligence, Switzerland, pp 102–108
Sun H, Wang S, Jiang Q (2004) FCM-based model selection algorithm for determining the number of clusters. Pattern Recogn 37(10): 2027–2037
Article MATH Google Scholar
Tabbone S (1994) Edge detection, subpixel and junctions using multiple scales. PhD thesis, Institut National Polytechnique de Lorraine, France (in French)
Zhang H, Liu X (2003) The comparison of clustering methods in data mining. Comput Appl Soft 2: 7–8
Google Scholar

Download references

Author information

Authors and Affiliations

College of Engineering, Shantou University, Shantou, 515063, Guangdong, China
Haojun Sun
Départment d’informatique, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
Shengrui Wang

Authors

Haojun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haojun Sun.

Additional information

Responsible editor: Charu Aggarwal.

The expression “simulation data” is used in this paper to designate a data set with known membership of data points w.r.t. each cluster.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, H., Wang, S. Measuring the component overlapping in the Gaussian mixture model. Data Min Knowl Disc 23, 479–502 (2011). https://doi.org/10.1007/s10618-011-0212-3

Download citation

Received: 20 May 2009
Accepted: 27 January 2011
Published: 27 February 2011
Issue Date: November 2011
DOI: https://doi.org/10.1007/s10618-011-0212-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring the component overlapping in the Gaussian mixture model

Abstract

Access this article

Similar content being viewed by others

A New Clustering Separation Measure Based on Negentropy

Improving model choice in classification: an approach based on clustering of covariance matrices

Quantum-like Gaussian mixture model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring the component overlapping in the Gaussian mixture model

Abstract

Access this article

Similar content being viewed by others

A New Clustering Separation Measure Based on Negentropy

Improving model choice in classification: an approach based on clustering of covariance matrices

Quantum-like Gaussian mixture model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation