Skip to main content

A New Efficient and Unbiased Approach for Clustering Quality Evaluation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Abstract

Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attik, M., Al Shehabi, S., Lamirel, J.-.C.: Clustering Quality Measures for Data Samples with Multiple Labels. In: IASTED International Conference on Artificial on Databases and Applications (DBA), Innsbruck, Austria, pp. 50–57 (February 2006)

    Google Scholar 

  2. Bock, H.-H.: Probability model and hypothese testing in partitionning cluster analysis. In: Arabie, P., Hubert, L.J., De Soete, G. (eds.) Clustering and Classification, pp. 377–453. World Scientific, Singapore (1996)

    Chapter  Google Scholar 

  3. Davies, D., Bouldin, W.: A cluster separation measure. IEEE Transaction on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)

    Article  Google Scholar 

  4. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the em algorithm. Journal of the Royal Statistical Society B-39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Diarmuid, Ó.S., Copestake, A.: Semantic classification with distributional kernels. In: Proceedings of COLING 2008, pp. 649–656 (2008)

    Google Scholar 

  6. Dunn, J.: Well Separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104

    Google Scholar 

  7. Forest, D.: Application de techniques de forage de textes de nature prédictive et exploratoire à des fins de gestion et danalyse thématique de documents textuels non structurés, PhD Thesis, Quebec University, Montreal, Canada (2007)

    Google Scholar 

  8. Ghribi, M., Cuxac, P., Lamirel, J.-C., Lelu, A.: Mesures de qualité de clustering de documents: Prise en compte de la distribution des mots-clés. In: Atelier EvalECD 2010, Hamamet, Tunisie (January 2010)

    Google Scholar 

  9. Gordon, A.D.: External validation in cluster analysis. Bulletin of the International Statistical Institute 51(2), 353–356 (1997); Response to comments. Bulletin of the International Statistical Institute  51(3), 414–415 (1998)

    MATH  Google Scholar 

  10. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2/3), 147–155 (2001)

    Article  MATH  Google Scholar 

  11. Kassab, R., Lamirel, J.-C.: Feature Based Cluster Validation for High Dimensional Data. In: IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria, pp. 97–103 (February 2008)

    Google Scholar 

  12. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 56–59 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lamirel, J.-C., Al-Shehabi, S., Francois, C., Hofmann, M.: New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping. Scientometrics 60, 445–562 (2004)

    Article  Google Scholar 

  14. Lamirel, J.-C., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria (February 2008)

    Google Scholar 

  15. Lebart, L., Morineau, A., Fenelon, J.P.: Traitement des données statistiques, Dunod, Paris (1979)

    Google Scholar 

  16. MacQueen, J.: Some methods of classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium in Mathematics, Statistics and Probability, vol. 1, pp. 281–297. Univ. of California, Berkeley (1967)

    Google Scholar 

  17. Martinetz, T., Schulten, K.: A neural gas network learns topologies. Artificial Neural Networks, 397–402 (1991)

    Google Scholar 

  18. Milligan, G.W., Cooper, M.C.: An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 50, 159–179

    Google Scholar 

  19. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65

    Google Scholar 

  20. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  21. Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lamirel, JC., Cuxac, P., Mall, R., Safi, G. (2012). A New Efficient and Unbiased Approach for Clustering Quality Evaluation. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28320-8_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28319-2

  • Online ISBN: 978-3-642-28320-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics