A Bounded Index for Cluster Validity

Saitta, Sandro; Raphael, Benny; Smith, Ian F. C.

doi:10.1007/978-3-540-73499-4_14

Sandro Saitta¹,
Benny Raphael¹ &
Ian F. C. Smith¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3809 Accesses
54 Citations

Abstract

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Webb, A.: Statistical Pattern Recognition. Wiley, Chichester (2002)
MATH Google Scholar
SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Inf. Process. Manage. 42(6), 1532–1552 (2006)
Article Google Scholar
Perdisci, R., Giacinto, G., Roli, F.: Alarm clustering for intrusion detection systems in computer networks. Engineering Applications of Artificial Intelligence 19(4), 429–438 (2006)
Article Google Scholar
Jaenichen, S., Perner, P.: Acquisition of concept descriptions by conceptual clustering. In: Perner, P., Amiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 153–162. Springer, Berlin, Heidelberg (2005)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)
Article MATH Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions Pattern Analysis Machine Intelligence 24(12), 1650–1654 (2002)
Article Google Scholar
Bezdek, J., Pal, N.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics 28(3), 301–315 (1998)
Article Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Article MathSciNet Google Scholar
Davies, D., Bouldin, W.: A cluster separation measure. IEEE PAMI 1, 224–227 (1979)
Google Scholar
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, West Sussex (1990)
Google Scholar
Kim, M., Ramakrishna, R.: New indices for cluster validity assessment. Pattern Recognition Letters 26(15), 2353–2363 (2005)
Article Google Scholar
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)
Article Google Scholar
Gordon, A.: Cluster Validation. In: Hayashi, C., Yajima, K., Bock, H.H., Ohsumi, N., Tanaka, Y., Baba, Y. (eds.) Data science, classification and related methods, pp. 22–39. Springer, Heidelberg (1996)
Google Scholar
Ling, R.: On the theory and construction of k-clusters. Computer Journal 15, 326–332 (1972)
Article MATH MathSciNet Google Scholar
Chou, C., Su, M., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Analysis Applications 7(2), 205–220 (2004)
MathSciNet Google Scholar
Bouguessa, M., Wang, S., Sun, H.: An objective approach to cluster validation. Pattern Recognition Letters 27(13), 1419–1430 (2006)
Article Google Scholar
Merz, C., Murphy, P.: UCI machine learning repository (1996), http://www.ics.uci.edu/~mlearn/MLSummary.html

Download references

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland
Sandro Saitta, Benny Raphael & Ian F. C. Smith

Authors

Sandro Saitta
View author publications
You can also search for this author in PubMed Google Scholar
Benny Raphael
View author publications
You can also search for this author in PubMed Google Scholar
Ian F. C. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saitta, S., Raphael, B., Smith, I.F.C. (2007). A Bounded Index for Cluster Validity. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics