Skip to main content

Evaluation of System Measures for Incomplete Relevance Judgment in IR

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4027))

Abstract

Incomplete relevance judgment has become a norm for the evaluation of some major information retrieval evaluation events such as TREC, but its effect on some system measures has not been well understood. In this paper, we evaluate four system measures, namely mean average precision, R-precision, normalized average precision over all documents, and normalized discount cumulative gain, under incomplete relevance judgment. Among them, the measure of normalized average precision over all documents is introduced, and both mean average precision and R-precision are generalized for graded relevance judgment. These four measures have a common characteristic: complete relevance judgment is required for the calculation of their accurate values. We empirically investigate these measures through extensive experimentation of TREC data and aim to find the effect of incomplete relevance judgment on them. From these experiments, we conclude that incomplete relevance judgment affects all these four measures’ values significantly. When using the pooling method in TREC, the more incomplete the relevance judgment is, the higher the values of all these measures usually become. We also conclude that mean average precision is the most sensitive but least reliable measure, normalized discount cumulative gain and normalized average precision over all documents are the most reliable but least sensitive measures, while R-precision is in the middle.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslam, J.A., Yilmaz, E.: A geometric interpretation and analysis of R-precision. In: Proceedings of ACM CIKM 2005, Bremen, Germany, October-November, pp. 664–671 (2005)

    Google Scholar 

  2. Aslam, J.A., Yilmaz, E., Pavlu, V.: The maximum entropy method for analyzing retrieval measures. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 27–34 (2005)

    Google Scholar 

  3. Barry, C.L.: User-defined relevance criteria: an exploratory study. Journal of the American Society for Information Science 45(3), 149–159 (1994)

    Article  Google Scholar 

  4. Bodoff, D., Robertson, S.: A new united probabilistic model. Journal of the American Society for Information Science and Technology 55(6), 471–487 (2004)

    Article  Google Scholar 

  5. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of ACM SIGIR 2000, Athens, Greece, pp. 33–40 (2000)

    Google Scholar 

  6. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of ACM SIGIR 2004, Sheffield, United Kingdom, pp. 25–32 (2004)

    Google Scholar 

  7. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 442–446 (2002)

    Article  Google Scholar 

  8. Sparck Jones, K., van Rijisbergen, C.: Report on the need for and provision of an ideal information retrieval test collection. Technical report, British library research and development report 5266, Computer laboratory, University of Cambridge, Cambridge, UK (1975)

    Google Scholar 

  9. Kekäläinen, J.: Binary and graded relevance in IR evaluations - comparison of the efforts on ranking of IR systems. Information Processing & Management 41(5), 1019–1033 (2005)

    Article  Google Scholar 

  10. Lee, C., Lee, G.G.: Probabilistic information retrieval model for a dependency structured indexing system. Information Processing & Management 41(2), 161–175 (2005)

    Article  MATH  Google Scholar 

  11. Saracevic, T.: Relevance: A review of and a framework for thinking on the notion in information science. Journal of the American Society for Information Science 26(6), 321–343 (1975)

    Article  Google Scholar 

  12. Schamber, L., Eisenberg, M.B., Nilan, M.S.: A re-examination of relevance: toward a dynamic, situational definition. Information Processing & Management 26(6), 755–776 (1990)

    Article  Google Scholar 

  13. Sanderson, M., Zobel, J.: Information retrieval system evaluation: Effort, sensitivity, and reliability. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil, pp. 162–169 (2005)

    Google Scholar 

  14. Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proceedings of ACM SIGIR 2002, Tampere, Finland, pp. 316–323 (2002)

    Google Scholar 

  15. Voorhees, E.M., Harman, D.: Overview of the sixth text retrieval conference (trec-6). Information Processing & Management 36(1), 3–35 (2000)

    Article  Google Scholar 

  16. Xu, Y., Benaroch, M.: Information retrieval with a hybrid automatic query expansion and data fusion procedure. Information Retrieval 8(1), 41–65 (2005)

    Article  Google Scholar 

  17. Zobel, J.: How reliable are the results of large-scale information retrieval experiments. In: Proceedings of ACM SIGIR 1998, Melbourne, Australia, pp. 307–314 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, S., McClean, S. (2006). Evaluation of System Measures for Incomplete Relevance Judgment in IR. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_21

Download citation

  • DOI: https://doi.org/10.1007/11766254_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34638-8

  • Online ISBN: 978-3-540-34639-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics