Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets

Zhu, Dengya; Nimmagadda, Shastri L.; Wong, Kok Wai; Reiners, Torsten

doi:10.1007/978-3-031-32418-5_9

Dengya Zhu¹⁶,
Shastri L. Nimmagadda¹⁷,
Kok Wai Wong¹⁸ &
…
Torsten Reiners¹⁶

Part of the book series: Lecture Notes in Information Systems and Organisation ((LNISO,volume 63))

Included in the following conference series:

International Conference on Information Systems Development

162 Accesses

Abstract

The quality of training/testing datasets is critical when a model is trained and evaluated by the annotated datasets. In Information Retrieval (IR), documents are annotated by human experts if they are relevant or not to a given query. Relevance judgment of human assessors is inherently subjective and dynamic. However, a small group of experts’ relevance judgment results are usually taken as ground truth to “objectively” evaluate the performance of an IR system. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert’s judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. Further, previous research focused mainly on the quality of documents, rather on the quality of queries submitted to an IR system. In this research, we introduce Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between two IR systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Text REtrieval Conference, https://trec.nist.gov/.
2.
http://www.odp.org/homepage.php.

References

Alonso, O., & Mizzaro, S. (2012). Using crowdsourcing for TREC relevance assessment. Information Processing & Management, 48, 1053–1066.
Article Google Scholar
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval (p. 544). Harlow: Addison Wesley.
Google Scholar
Bailey, P., et al. (2008). Relevance assessment: Are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM.
Google Scholar
Borlund, P. (2003). The concept of relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913–925.
Article Google Scholar
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens Grace: ACM Press.
Google Scholar
Burgin, R. (1992). Variations in relevance judgments and the evaluation of retrieval performance. Information Processing and Management, 28(5), 619–627.
Article Google Scholar
Carterette, B., & Soboroff, I. (2010). The effect of assessor errors on IR system evaluation. In The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland: ACM.
Google Scholar
Davidov, D., Gabrilovich, E., & Markovitch, S. (2004). Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.
Google Scholar
Harter, S. P. (1992). Psychological relevance and information science. Journal of the American Society for Information Science, 43(9), 602–615.
Article Google Scholar
Harter, S. P. (1996). Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1), 37–49.
Article Google Scholar
Hjørland, B. (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217–223.
Google Scholar
Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263.
Article Google Scholar
Kazai, G., Milic-Frayling, N., & Costello, J. (2009). Towards methods for the collective gathering and quality control of relevance assessments. In The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston, Massachusetts: ACM.
Google Scholar
Lesk, M. E., & Salton, G. (1969). Measuring the agreement among relevance judges. Information Storage and Retrieval, 4, 343–359.
Article Google Scholar
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Book Google Scholar
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–832.
Article Google Scholar
Mizzaro, S. (1998). How many relevances in information retrieval. Interacting with Computers, 10(3), 303–320.
Article Google Scholar
Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed., p. 710). Wiley.
Google Scholar
Samimi, P., & Devi, R. (2014). Creation of reliable relevance judgments in information retrieval systems evaluation experimentation through crowdsourcing: A review. The Scientific World Journal, 2014, 13.
Article Google Scholar
Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion of information science. Journal of the American Society for Information Science, 26(6), 321–343.
Article Google Scholar
Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915–1933
Google Scholar
Saracevic, T. (2007). Relevance: A review of the literature and a framework for the thinking on the notion of information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126–2144.
Google Scholar
Saracevic, T. (2008). Effects of inconsistent relevance judgments on information retrieval test results: A historical perspective. Library Trends, 56(4), 763–783.
Article Google Scholar
Saracevic, T. (2016). The notion of relevance in information science—Everybody knows what relevance is. But what is it really? In G. Marchionini (Ed.), Synthesis Lectures on Information Concepts, Retrieval, and Services (p. 130). Morgan & Claypool.
Google Scholar
Smyth, B. (2007). A community-based approach to personalizing web search. Computer, 40(8), 42–50.
Article Google Scholar
Soboroff, I., Nicholas, C., & Cahan, P. (2001). Ranking retrieval systems without relevance judgments. In The 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR'01). New Orleans, Louisiana: ACM.
Google Scholar
Spink, A., & Greisdorf, H. (2001). Regions and levels: Measuring and mapping users’ relevance judgments. Journal of the American Society for Information Science and Technology, 52(2), 161–173.
Article Google Scholar
Vakkari, P., & Sormunen, E. (2004). The influence of relevance levels on the effectiveness of interactive information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 963–969.
Article Google Scholar
Voorhees, E. M. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36, 697–716.
Article Google Scholar
Wu, S., & Crestani. F. (2003). Methods for ranking information retrieval systems without relevance judgments. In The 2003 ACM symposium on Applied Computing. Melbourne, Florida: ACM.
Google Scholar
Xu, Y., & Chen, Z. (2006). Relevance judgment: What do information users consider beyond topicality? Journal of the American Society for Information Science and Technology, 57(7), 961–973.
Google Scholar
Zeng, H.-J., et al. (2004). Learning to cluster web search results. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM Press.
Google Scholar
Zhu, D. (2010). Improving the relevance of web search results by combining web snippet categorization, clustering and personalization. In School of Information Systems (p. 264). Curtin University of Technology: Curtin University of Technology.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Management, Curtin University, Perth, WA, Australia
Dengya Zhu & Torsten Reiners
Curtin University, Perth, WA, Australia
Shastri L. Nimmagadda
Murdoch University, Perth, WA, Australia
Kok Wai Wong

Authors

Dengya Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shastri L. Nimmagadda
View author publications
You can also search for this author in PubMed Google Scholar
Kok Wai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Reiners
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dengya Zhu .

Editor information

Editors and Affiliations

Business Informatics Research Center, Babes-Bolyai University, Cluj-Napoca, Romania
Gheorghe Cosmin Silaghi
Business Informatics Research Center, Babes-Bolyai University, Cluj Napoca, Romania
Robert Andrei Buchmann
Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
Virginia Niculescu
Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
Gabriela Czibula
Business Information Systems Discipline, University of Galway, Galway, Ireland
Chris Barry
Business Information Systems Discipline, University of Galway, Galway, Ireland
Michael Lang
Department of Human-Centred Computing, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
Henry Linger
IESE Business School, University of Navarra, Barcelona, Spain
Christoph Schneider

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, D., Nimmagadda, S.L., Wong, K.W., Reiners, T. (2023). Relevance Judgment Convergence Degree—A Measure of Assessors Inconsistency for Information Retrieval Datasets. In: Silaghi, G.C., et al. Advances in Information Systems Development. ISD 2022. Lecture Notes in Information Systems and Organisation, vol 63. Springer, Cham. https://doi.org/10.1007/978-3-031-32418-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-32418-5_9
Published: 27 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32417-8
Online ISBN: 978-3-031-32418-5
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics