Information Retrieval

, Volume 12, Issue 4, pp 461–486 | Cite as

A comparison of extrinsic clustering evaluation metrics based on formal constraints

  • Enrique AmigóEmail author
  • Julio Gonzalo
  • Javier Artiles
  • Felisa Verdejo


There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.


Clustering Evaluation metrics Formal constraints 



This work has been partially supported by research grants QEAVIS (TIN2007-67581-C02-01) and INES/Text-Mess (TIN2006-15265-C06-02) from the Spanish government. We are indebted to Fernando López-Ostenero and three anonymous reviewers for their comments on earlier versions of this work, and to Paul Kalmar for suggesting the cheat strategy for the overlapping clustering task.


  1. Artiles, J., Gonzalo, J., & Sekine, S. (2007). The Semeval-2007 Weps evaluation: Establishing a benchmark for the web people search task. In Proceedings of the 4th International Workshop on Semantic Evaluations (Semeval-2007), June 23–24 (pp. 64–69). Prague.Google Scholar
  2. Bagga, A., & Baldwin, B. (1998). Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL’98) (pp. 79–85). Montreal.Google Scholar
  3. Bakus, J., Hussin, M. F., & Kamel, M. (2002). A SOM-based document clustering using phrases. In Proceedings of the 9th International Conference on Neural Information Procesing (ICONIP’02) (pp. 2212–2216). Singapore.Google Scholar
  4. Dom, B. (2001). An information-theoretic external cluster-validity measure. IBM Research Report.Google Scholar
  5. Ghosh, J. (2003). Scalable clustering methods for data mining. In N. Ye (Ed.), Handbook of data mining. NJ: Lawrence Erlbaum.Google Scholar
  6. Gonzalo, J., & Peters, C. (2005). The impact of evaluation on multilingual text retrieval. In Proceedings of SIGIR 2005 (pp. 603–604). Salvador de Bahia.Google Scholar
  7. Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2–3), 107–145.zbMATHCrossRefGoogle Scholar
  8. Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In Knowledge Discovery and Data Mining (pp. 16–22). San Diego, CA.Google Scholar
  9. Meila, M. (2003). Comparing clusterings. In Proceedings of COLT 03. Washington, DC.Google Scholar
  10. Pantel, P., & Lin, D. (2002). Efficiently clustering documents with committees. In Proceedings of the PRICAI 2002 7th Pacific Rim International Conference on Artificial Intelligence (pp. 18–22). Tokyo, Japan.Google Scholar
  11. Rosenberg, A., & Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 410–420). Prague.Google Scholar
  12. Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques, KDD 2000 (pp. 109–110). Boston, MA.Google Scholar
  13. Strehl, A. (2002). Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD thesis, The University of Texas at Austin.Google Scholar
  14. Van Rijsbergen, C. (1974). Foundation of evaluation. Journal of Documentation, 30(4), 365–373.CrossRefGoogle Scholar
  15. Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 267–273). NY: ACM Press.Google Scholar
  16. Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Technical Report TR 01-40. Department of Computer Science, University of Minnesota, Minneapolis, MN.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Enrique Amigó
    • 1
    Email author
  • Julio Gonzalo
    • 1
  • Javier Artiles
    • 1
  • Felisa Verdejo
    • 1
  1. 1.Departamento de Lenguajes y Sistemas InformáticosUNEDMadridSpain

Personalised recommendations