Skip to main content

Evaluating Information Retrieval System Performance Based on Multi-grade Relevance

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4994))

Included in the following conference series:

  • 1053 Accesses

Abstract

One of the challenges of modern information retrieval is to rank the most relevant documents at the top of the large system output. This brings a call for choosing the proper methods to evaluate the system performance. The traditional performance measures, such as precision and recall, are not able to distinguish different levels of relevance because they are only based on binary relevance. The main objective of this paper is to review 10 existing evaluation methods based on multi-grade relevance and compare their similarities and differences through theoretical and numerical examinations. We find that the normalized distance performance measure is the best choice in terms of the sensitivity to document rank order and giving higher credits to systems for their ability of retrieving highly relevant documents. The cumulated gain-based methods rely on the total relevance score and are not sensitive enough to document rank order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Champney, H., Marshall, H.: Optimal Refinement of the Rating Scale. Journal of Applied Psychology 23, 323–331 (1939)

    Article  Google Scholar 

  2. Cleverdon, C., Mills, J., Keen, M.: Factors Dermnining the Performance of Indexing Systems. Aslib Cranfield Research Project, Cranfield, UK (1966)

    Google Scholar 

  3. Cox, E.P.: The Optimal Number of Response Alternatives for A Scale: A Review. Journal of Marketing Research XVII, 407–422 (1980)

    Google Scholar 

  4. Cuadra, C.A., Katter, R.V.: Experimental Studies of Relevance Judgments: Final Report. System Development Corp., Santa Monica, CA (1967)

    Google Scholar 

  5. Eisenberg, M., Hu, X.: Dichotomous Relevance Judgments and the Evaluation of Information Systems. In: Proceeding of the American Scoiety for Information Science, 50th Annual Meeting, Learning Information, Medford, NJ (1987)

    Google Scholar 

  6. Eisenberg, M.: Measuring Relevance Judgments. Information Processing and Management 24(4), 373–389 (1988)

    Article  MathSciNet  Google Scholar 

  7. Katter, R.V.: The Influence of Scale from on Relevance Judgments. Information Storage and Retrieval 4(1), 1–11 (1968)

    Article  Google Scholar 

  8. Mizzaro, S.: A New Measure of Retrieval Effectiveness (Or: What’s Wrong with Precision and Recalls). In: International Workshop on Information Retrieval, pp. 43–52 (2001)

    Google Scholar 

  9. Jacoby, J., Matell, M.S.: Three Point Likert Scales are Good Enough. Journal of Marketing Research VIII, 495–500 (1971)

    Article  Google Scholar 

  10. Jarvelin, K., Kekalainen, J.: IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York (2000)

    Google Scholar 

  11. Jarvelin, K., Kekalainen, J.: Cumulated Gain-based Evaluation of IR Techniques. ACM Transactions on Information Systems 20, 422–446 (2002)

    Article  Google Scholar 

  12. Kando, N., Kuriyams, K., Yoshioka, M.: Information Retrieval System Evaluation Using Multi-grade Relevance Judgments. In: Discussion on Averageable Singel-numbered Measures, JPSJ SIG Notes, pp. 105–112 (2001)

    Google Scholar 

  13. Pollack, S.M.: Measures for the Comparison of Information Retrieval System. American Documentation 19(4), 387–397 (1968)

    Article  Google Scholar 

  14. Rasmay, J.O.: The Effect of Number of Categories in Rating Scales on Precision of Estimation of Scale Values. Psychometrika 38(4), 513–532 (1973)

    Article  Google Scholar 

  15. Rees, A.M., Schultz, D.G.: A Field Experimental Approch to the Study of Relevance Assessments in Reletion to Document Searching. Cleverland: Case Western Reserve University (1967)

    Google Scholar 

  16. Rocchio, J.J.: Performance Indices for Document Retrieval. In: Salton, G. (ed.) The SMART Retrieval System-experiments in Automatic Document Processing, pp. 57–67 (1971)

    Google Scholar 

  17. Sagara, Y.: Performance Measures for Ranked Output Retrieval Systems. Journal of Japan Society of Information and Knowledge 12(2), 22–36 (2002)

    Google Scholar 

  18. Sakai, T.: Average Gain Ratio: A Simple Retrieval Performance Mmeasure for Evaluation with Multiple Relevance Levels. In: Proceedings of ACM SIGIR, pp. 417–418 (2003)

    Google Scholar 

  19. Sakai, T.: New Performance Matrics Based on Multi-grade Relevance: Their Application to Question Answering. In: NTCIR-4 Proceedings (2004)

    Google Scholar 

  20. Saracevic, T.: Comparative Effects of Titles, Abstracts and Full Texts on Relevance Judgments. In: Proceedings of the American Scoiety for Information Science, Learning Information, Medford, NJ (1969)

    Google Scholar 

  21. Vickery, B.C.: Subject Analysis for Information Retrieval. In: Proceedings of the International Conference on Scientific Information, vol. 2, pp. 855–865 (1959)

    Google Scholar 

  22. Wong, S.K.M., Yao, Y.Y., Bollmann, P.: Linear Structure in Information Retrieval. In: Proceedings of the 11th Annual Interna- tional ACMSIGIR Conference on Research and Development in In- formation Retrieval, vol. 2, pp. 19–232 (1988)

    Google Scholar 

  23. Yao, Y.Y.: Measuring Retrieval Effectiveness Based on User Preference of Documents. Journal of the American Society for Information Science 46(2), 133–145 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Aijun An Stan Matwin Zbigniew W. Raś Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, B., Yao, Y. (2008). Evaluating Information Retrieval System Performance Based on Multi-grade Relevance. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68123-6_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68122-9

  • Online ISBN: 978-3-540-68123-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics