Abstract
The quality of training/testing datasets is critical when a model is trained and evaluated by the annotated datasets. In Information Retrieval (IR), documents are annotated by human experts if they are relevant or not to a given query. Relevance judgment of human assessors is inherently subjective and dynamic. However, a small group of experts’ relevance judgment results are usually taken as ground truth to “objectively” evaluate the performance of an IR system. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert’s judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. Further, previous research focused mainly on the quality of documents, rather on the quality of queries submitted to an IR system. In this research, we introduce Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between two IR systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Text REtrieval Conference, https://trec.nist.gov/.
- 2.
References
Alonso, O., & Mizzaro, S. (2012). Using crowdsourcing for TREC relevance assessment. Information Processing & Management, 48, 1053–1066.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval (p. 544). Harlow: Addison Wesley.
Bailey, P., et al. (2008). Relevance assessment: Are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM.
Borlund, P. (2003). The concept of relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913–925.
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens Grace: ACM Press.
Burgin, R. (1992). Variations in relevance judgments and the evaluation of retrieval performance. Information Processing and Management, 28(5), 619–627.
Carterette, B., & Soboroff, I. (2010). The effect of assessor errors on IR system evaluation. In The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland: ACM.
Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.
Harter, S. P. (1992). Psychological relevance and information science. Journal of the American Society for Information Science, 43(9), 602–615.
Harter, S. P. (1996). Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1), 37–49.
Hjørland, B. (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217–223.
Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263.
Kazai, G., Milic-Frayling, N., & Costello, J. (2009). Towards methods for the collective gathering and quality control of relevance assessments. In The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, Massachusetts: ACM.
Lesk, M. E., & Salton, G. (1969). Measuring the agreement among relevance judges. Information Storage and Retrieval, 4, 343–359.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–832.
Mizzaro, S. (1998). How many relevances in information retrieval. Interacting with Computers, 10(3), 303–320.
Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed., p. 710). Wiley.
Samimi, P., & Devi, R. (2014). Creation of reliable relevance judgments in information retrieval systems evaluation experimentation through crowdsourcing: A review. The Scientific World Journal, 2014, 13.
Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion of information science. Journal of the American Society for Information Science, 26(6), 321–343.
Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915–1933
Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126–2144.
Saracevic, T. (2008). Effects of inconsistent relevance judgments on information retrieval test results: A historical perspective. Library Trends, 56(4), 763–783.
Saracevic, T. (2016). The notion of relevance in information science—Everybody knows what relevance is. But what is it really? In G. Marchionini (Ed.), Synthesis Lectures on Information Concepts, Retrieval, and Services (p. 130). Morgan & Claypool.
Smyth, B. (2007). A community-based approach to personalizing web search. Computer, 40(8), 42–50.
Soboroff, I., Nicholas, C., & Cahan, P. (2001). Ranking retrieval systems without relevance judgments. In The 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'01). New Orleans, Louisiana: ACM.
Spink, A., & Greisdorf, H. (2001). Regions and levels: Measuring and mapping users’ relevance judgments. Journal of the American Society for Information Science and Technology, 52(2), 161–173.
Vakkari, P., & Sormunen, E. (2004). The influence of relevance levels on the effectiveness of interactive information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 963–969.
Voorhees, E. M. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36, 697–716.
Wu, S., & Crestani. F. (2003). Methods for ranking information retrieval systems without relevance judgments. In The 2003 ACM symposium on Applied Computing. Melbourne, Florida: ACM.
Xu, Y., & Chen, Z. (2006). Relevance judgment: What do information users consider beyond topicality? Journal of the American Society for Information Science and Technology, 57(7), 961–973.
Zeng, H.-J., et al. (2004). Learning to cluster web search results. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.
Zhu, D. (2010). Improving the relevance of web search results by combining web snippet categorization, clustering and personalization. In School of Information Systems (p. 264). Curtin University of Technology: Curtin University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhu, D., Nimmagadda, S.L., Wong, K.W., Reiners, T. (2023). Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets. In: Silaghi, G.C., et al. Advances in Information Systems Development. ISD 2022. Lecture Notes in Information Systems and Organisation, vol 63. Springer, Cham. https://doi.org/10.1007/978-3-031-32418-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-32418-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32417-8
Online ISBN: 978-3-031-32418-5
eBook Packages: Business and ManagementBusiness and Management (R0)