Skip to main content

A Bounded Index for Cluster Validity

  • Conference paper
Book cover Machine Learning and Data Mining in Pattern Recognition (MLDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Abstract

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  2. Webb, A.: Statistical Pattern Recognition. Wiley, Chichester (2002)

    MATH  Google Scholar 

  3. SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Inf. Process. Manage. 42(6), 1532–1552 (2006)

    Article  Google Scholar 

  4. Perdisci, R., Giacinto, G., Roli, F.: Alarm clustering for intrusion detection systems in computer networks. Engineering Applications of Artificial Intelligence 19(4), 429–438 (2006)

    Article  Google Scholar 

  5. Jaenichen, S., Perner, P.: Acquisition of concept descriptions by conceptual clustering. In: Perner, P., Amiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 153–162. Springer, Berlin, Heidelberg (2005)

    Google Scholar 

  6. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)

    Article  MATH  Google Scholar 

  7. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)

    Article  Google Scholar 

  8. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  9. Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions Pattern Analysis Machine Intelligence 24(12), 1650–1654 (2002)

    Article  Google Scholar 

  10. Bezdek, J., Pal, N.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics 28(3), 301–315 (1998)

    Article  Google Scholar 

  11. Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  12. Davies, D., Bouldin, W.: A cluster separation measure. IEEE PAMI 1, 224–227 (1979)

    Google Scholar 

  13. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, West Sussex (1990)

    Google Scholar 

  14. Kim, M., Ramakrishna, R.: New indices for cluster validity assessment. Pattern Recognition Letters 26(15), 2353–2363 (2005)

    Article  Google Scholar 

  15. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Processing 83(4), 825–833 (2003)

    Article  Google Scholar 

  16. Gordon, A.: Cluster Validation. In: Hayashi, C., Yajima, K., Bock, H.H., Ohsumi, N., Tanaka, Y., Baba, Y. (eds.) Data science, classification and related methods, pp. 22–39. Springer, Heidelberg (1996)

    Google Scholar 

  17. Ling, R.: On the theory and construction of k-clusters. Computer Journal 15, 326–332 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  18. Chou, C., Su, M., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Analysis Applications 7(2), 205–220 (2004)

    MathSciNet  Google Scholar 

  19. Bouguessa, M., Wang, S., Sun, H.: An objective approach to cluster validation. Pattern Recognition Letters 27(13), 1419–1430 (2006)

    Article  Google Scholar 

  20. Merz, C., Murphy, P.: UCI machine learning repository (1996), http://www.ics.uci.edu/~mlearn/MLSummary.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saitta, S., Raphael, B., Smith, I.F.C. (2007). A Bounded Index for Cluster Validity. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73499-4_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73498-7

  • Online ISBN: 978-3-540-73499-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics