Skip to main content

Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets

  • Chapter
  • First Online:
Advances in Information Systems Development (ISD 2022)

Part of the book series: Lecture Notes in Information Systems and Organisation ((LNISO,volume 63))

Included in the following conference series:

  • 162 Accesses

Abstract

The quality of training/testing datasets is critical when a model is trained and evaluated by the annotated datasets. In Information Retrieval (IR), documents are annotated by human experts if they are relevant or not to a given query. Relevance judgment of human assessors is inherently subjective and dynamic. However, a small group of experts’ relevance judgment results are usually taken as ground truth to “objectively” evaluate the performance of an IR system. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert’s judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. Further, previous research focused mainly on the quality of documents, rather on the quality of queries submitted to an IR system. In this research, we introduce Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between two IR systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Text REtrieval Conference, https://trec.nist.gov/.

  2. 2.

    http://www.odp.org/homepage.php.

References

  1. Alonso, O., & Mizzaro, S. (2012). Using crowdsourcing for TREC relevance assessment. Information Processing & Management, 48, 1053–1066.

    Article  Google Scholar 

  2. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval (p. 544). Harlow: Addison Wesley.

    Google Scholar 

  3. Bailey, P., et al. (2008). Relevance assessment: Are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM.

    Google Scholar 

  4. Borlund, P. (2003). The concept of relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913–925.

    Article  Google Scholar 

  5. Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens Grace: ACM Press.

    Google Scholar 

  6. Burgin, R. (1992). Variations in relevance judgments and the evaluation of retrieval performance. Information Processing and Management, 28(5), 619–627.

    Article  Google Scholar 

  7. Carterette, B., & Soboroff, I. (2010). The effect of assessor errors on IR system evaluation. In The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland: ACM.

    Google Scholar 

  8. Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.

    Google Scholar 

  9. Harter, S. P. (1992). Psychological relevance and information science. Journal of the American Society for Information Science, 43(9), 602–615.

    Article  Google Scholar 

  10. Harter, S. P. (1996). Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1), 37–49.

    Article  Google Scholar 

  11. Hjørland, B. (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217–223.

    Google Scholar 

  12. Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263.

    Article  Google Scholar 

  13. Kazai, G., Milic-Frayling, N., & Costello, J. (2009). Towards methods for the collective gathering and quality control of relevance assessments. In The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, Massachusetts: ACM.

    Google Scholar 

  14. Lesk, M. E., & Salton, G. (1969). Measuring the agreement among relevance judges. Information Storage and Retrieval, 4, 343–359.

    Article  Google Scholar 

  15. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

    Book  Google Scholar 

  16. Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–832.

    Article  Google Scholar 

  17. Mizzaro, S. (1998). How many relevances in information retrieval. Interacting with Computers, 10(3), 303–320.

    Article  Google Scholar 

  18. Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed., p. 710). Wiley.

    Google Scholar 

  19. Samimi, P., & Devi, R. (2014). Creation of reliable relevance judgments in information retrieval systems evaluation experimentation through crowdsourcing: A review. The Scientific World Journal, 2014, 13.

    Article  Google Scholar 

  20. Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion of information science. Journal of the American Society for Information Science, 26(6), 321–343.

    Article  Google Scholar 

  21. Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915–1933

    Google Scholar 

  22. Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126–2144.

    Google Scholar 

  23. Saracevic, T. (2008). Effects of inconsistent relevance judgments on information retrieval test results: A historical perspective. Library Trends, 56(4), 763–783.

    Article  Google Scholar 

  24. Saracevic, T. (2016). The notion of relevance in information science—Everybody knows what relevance is. But what is it really? In G. Marchionini (Ed.), Synthesis Lectures on Information Concepts, Retrieval, and Services (p. 130). Morgan & Claypool.

    Google Scholar 

  25. Smyth, B. (2007). A community-based approach to personalizing web search. Computer, 40(8), 42–50.

    Article  Google Scholar 

  26. Soboroff, I., Nicholas, C., & Cahan, P. (2001). Ranking retrieval systems without relevance judgments. In The 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'01). New Orleans, Louisiana: ACM.

    Google Scholar 

  27. Spink, A., & Greisdorf, H. (2001). Regions and levels: Measuring and mapping users’ relevance judgments. Journal of the American Society for Information Science and Technology, 52(2), 161–173.

    Article  Google Scholar 

  28. Vakkari, P., & Sormunen, E. (2004). The influence of relevance levels on the effectiveness of interactive information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 963–969.

    Article  Google Scholar 

  29. Voorhees, E. M. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36, 697–716.

    Article  Google Scholar 

  30. Wu, S., & Crestani. F. (2003). Methods for ranking information retrieval systems without relevance judgments. In The 2003 ACM symposium on Applied Computing. Melbourne, Florida: ACM.

    Google Scholar 

  31. Xu, Y., & Chen, Z. (2006). Relevance judgment: What do information users consider beyond topicality? Journal of the American Society for Information Science and Technology, 57(7), 961–973.

    Google Scholar 

  32. Zeng, H.-J., et al. (2004). Learning to cluster web search results. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.

    Google Scholar 

  33. Zhu, D. (2010). Improving the relevance of web search results by combining web snippet categorization, clustering and personalization. In School of Information Systems (p. 264). Curtin University of Technology: Curtin University of Technology.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dengya Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhu, D., Nimmagadda, S.L., Wong, K.W., Reiners, T. (2023). Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets. In: Silaghi, G.C., et al. Advances in Information Systems Development. ISD 2022. Lecture Notes in Information Systems and Organisation, vol 63. Springer, Cham. https://doi.org/10.1007/978-3-031-32418-5_9

Download citation

Publish with us

Policies and ethics