Abstract
Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.
Keywords
- clustering
- cluster validity
- validity index
- k-means
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Webb, A.: Statistical Pattern Recognition. Wiley, Chichester (2002)
SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Inf. Process. Manage. 42(6), 1532–1552 (2006)
Perdisci, R., Giacinto, G., Roli, F.: Alarm clustering for intrusion detection systems in computer networks. Engineering Applications of Artificial Intelligence 19(4), 429–438 (2006)
Jaenichen, S., Perner, P.: Acquisition of concept descriptions by conceptual clustering. In: Perner, P., Amiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 153–162. Springer, Berlin, Heidelberg (2005)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions Pattern Analysis Machine Intelligence 24(12), 1650–1654 (2002)
Bezdek, J., Pal, N.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics 28(3), 301–315 (1998)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Davies, D., Bouldin, W.: A cluster separation measure. IEEE PAMI 1, 224–227 (1979)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, West Sussex (1990)
Kim, M., Ramakrishna, R.: New indices for cluster validity assessment. Pattern Recognition Letters 26(15), 2353–2363 (2005)
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)
Gordon, A.: Cluster Validation. In: Hayashi, C., Yajima, K., Bock, H.H., Ohsumi, N., Tanaka, Y., Baba, Y. (eds.) Data science, classification and related methods, pp. 22–39. Springer, Heidelberg (1996)
Ling, R.: On the theory and construction of k-clusters. Computer Journal 15, 326–332 (1972)
Chou, C., Su, M., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Analysis Applications 7(2), 205–220 (2004)
Bouguessa, M., Wang, S., Sun, H.: An objective approach to cluster validation. Pattern Recognition Letters 27(13), 1419–1430 (2006)
Merz, C., Murphy, P.: UCI machine learning repository (1996), http://www.ics.uci.edu/~mlearn/MLSummary.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saitta, S., Raphael, B., Smith, I.F.C. (2007). A Bounded Index for Cluster Validity. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)