Frontiers of Computer Science

, Volume 6, Issue 5, pp 568–580 | Cite as

The ClasSi coefficient for the evaluation of ranking quality in the presence of class similarities

  • Anca Maria Ivanescu
  • Marc Wichterich
  • Christian Beecks
  • Thomas Seidl
Research Article

Abstract

Evaluation measures play an important role in the design of new approaches, and often quality is measured by assessing the relevance of the obtained result set. While many evaluation measures based on precision/recall are based on a binary relevance model, ranking correlation coefficients are better suited for multi-class problems. State-of-the-art ranking correlation coefficients like Kendall’s τ and Spearman’s ρ do not allow the user to specify similarities between differing object classes and thus treat the transposition of objects from similar classes the same way as that of objects from dissimilar classes. We propose ClasSi, a new ranking correlation coefficient which deals with class label rankings and employs a class distance function to model the similarities between the classes. We also introduce a graphical representation of ClasSi which describes how the correlation evolves throughout the ranking.

Keywords

ranking quality measure class similarity ClasSi 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    van Rijsbergen C J. Information Retrieval. 2nd ed. London: Butterworth-Heinemann, 1979Google Scholar
  2. 2.
    Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008MATHCrossRefGoogle Scholar
  3. 3.
    Flach P A, Blockeel H, Ferri C, Hernández-Orallo J, Struyf J. Decision support for data mining; introduction to ROC analysis and its applications. In: Mladenic D, Lavračn, Bohanec M, Moyle S, eds. Data Mining and Decision Support: Integration and Collaboration. Boston: Kluwer Academic Publishers, 2003, 81–90CrossRefGoogle Scholar
  4. 4.
    Hand D J, Till R J. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186MATHCrossRefGoogle Scholar
  5. 5.
    Ferri C, Hernández-Orallo J, Salido M A. Volume under the ROC surface for multi-class problems. In: Proceedings of the 14th European Conference on Machine Learning. 2003, 108–120Google Scholar
  6. 6.
    Hassan M R, Ramamohanarao K, Karmakar C K, Hossain M M, Bailey J. A novel scalable multi-class ROC for effective visualization and computation. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part I. 2010, 107–120Google Scholar
  7. 7.
    Kendall M. A new measure of rank correlation. Biometrika, 1938, 30(1–2): 81–89MathSciNetMATHGoogle Scholar
  8. 8.
    Spearman C. The proof and measurement of association between two things. The American Journal of Psychology, 1987, 100(3/4): 441–471CrossRefGoogle Scholar
  9. 9.
    Kendall M, Gibbons J D. Rank Correlation Methods. London: Edward Arnold, 1990MATHGoogle Scholar
  10. 10.
    Goodman L A, Kruskal W H. Measures of association for cross classifications. Journal of the American Statistical Association, 1954, 49(268): 732–764MATHGoogle Scholar
  11. 11.
    Somers R H. A new asymmetric measure of association for ordinal variables. American Sociological Review, 1962, 27(6): 799–811CrossRefGoogle Scholar
  12. 12.
    Ivanescu A, Wichterich M, Seidl T. ClasSi: measuring ranking quality in the presence of object classes with similarity information. In: Proceedings of PAKDD 2011 Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop. 2011, 185–196Google Scholar
  13. 13.
    Beecks C, Uysal M S, Seidl T. Signature quadratic form distance. In: Proceedings of the 2010 ACM International Conference on Image and Video Retrieval. 2010, 438–445Google Scholar
  14. 14.
    Rubner Y, Tomasi C, Guibas L J. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2): 99–121MATHCrossRefGoogle Scholar
  15. 15.
    Wang J Z, Li J, Wiederhold G. Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(9): 947–963CrossRefGoogle Scholar
  16. 16.
    van de Sande K E A, Gevers T, Snoek C G M. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1582–1596CrossRefGoogle Scholar
  17. 17.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10–18CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Anca Maria Ivanescu
    • 1
  • Marc Wichterich
    • 1
  • Christian Beecks
    • 1
  • Thomas Seidl
    • 1
  1. 1.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations