Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

Gao, Ning; Webber, William; Oard, Douglas W.

doi:10.1007/978-3-319-06028-6_1

Ning Gao²²,
William Webber²³ &
Douglas W. Oard²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8416))

Included in the following conference series:

European Conference on Information Retrieval

2993 Accesses
4 Citations
1 Altmetric

Abstract

Relevance judgments are often the most expensive part of information retrieval evaluation, and techniques for comparing retrieval systems using fewer relevance judgments have received significant attention in recent years. This paper proposes a novel system comparison method using an expectation-maximization algorithm. In the expectation step, real-valued pseudo-judgments are estimated from a set of system results. In the maximization step, new system weights are learned from a combination of a limited number of actual human judgments and system pseudo-judgments for the other documents. The method can work without any human judgments, and is able to improve its accuracy by incrementally adding human judgments. Experiments using TREC Ad Hoc collections demonstrate strong correlations with system rankings using pooled human judgments, and comparison with existing baselines indicates that the new method achieves the same comparison reliability with fewer human judgments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aslam, J., Pavlu, V., Savell, R.: A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm. In: Proc. 26th Annual International ACM SIGIR, pp. 393–394 (2003)
Google Scholar
Aslam, J., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proc. 29th Annual International ACM SIGIR, pp. 541–548 (2006)
Google Scholar
Buckley, C., Voorhees, E.: Retrieval evaluation with incomplete information. In: Proc. 27th Annual International ACM SIGIR, pp. 25–32 (2004)
Google Scholar
Carterette, B.: Robust test collections for retrieval evaluation. In: Proc. 30th Annual International ACM SIGIR, pp. 55–62 (2007)
Google Scholar
Carterette, B., Allan, J.: Incremental test collections. In: Proc. 14th ACM International Conference on Information and Knowledge Management, pp. 680–687 (2005)
Google Scholar
Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proc. 21st Annual International ACM SIGIR, pp. 282–289 (1998)
Google Scholar
Dai, K., Pavlu, V., Kanoulas, E., Aslam, J.A.: Extended expectation maximization for inferring score distributions. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 293–304. Springer, Heidelberg (2012)
Chapter Google Scholar
Hauff, C., Hiemstra, D., Azzopardi, L., de Jong, F.: A case for automatic system evaluation. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 153–165. Springer, Heidelberg (2010)
Chapter Google Scholar
Hosseini, M., Cox, I.J., Milić-Frayling, N., Kazai, G., Vinay, V.: On aggregating labels from multiple crowd workers to infer relevance of documents. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 182–194. Springer, Heidelberg (2012)
Chapter Google Scholar
Kendall, M.G.: Rank Correlation Methods, 1st edn. Charles Griffin, London (1948)
MATH Google Scholar
Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Information Processing & Management 42(3), 595–614 (2006)
Article MATH Google Scholar
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proc. 24th Annual International ACM SIGIR, pp. 66–73 (2001)
Google Scholar
Spärck Jones, K., van Rijsbergen, C.J.: Report on the need for and provision of an ‘ideal’ test collection. Tech. rep., University Computer Laboratory, Cambridge (1975)
Google Scholar
Yilmaz, E., Aslam, J.: Estimating average precision with incomplete and imperfect judgments. In: Proc. 15th ACM International Conference on Information and Knowledge Management, pp. 102–111 (2006)
Google Scholar
Yilmaz, E., Aslam, J., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proc. 31st Annual International ACM SIGIR, pp. 587–594 (2008a)
Google Scholar
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating ap and ndcg. In: Proc. 31st Annual International ACM SIGIR, pp. 603–610 (2008b)
Google Scholar
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proc. 21st Annual International ACM SIGIR, pp. 307–314 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Studies/UMIACS, University of Maryland, College Park, USA
Ning Gao & Douglas W. Oard
William Webber Consulting, USA
William Webber

Authors

Ning Gao
View author publications
You can also search for this author in PubMed Google Scholar
William Webber
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Oard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke & Tom Kenter &
Centrum Wiskunde en Informatica, Amsterdam, The Netherlands and Delft University of Technology, Delft, The Netherlands
Arjen P. de Vries
University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai
University of Twente, Twente, The Netheralnds and Erasmus University Rotterdam, Rotterdam, The Netherlands
Franciska de Jong
SalesPredict, Haifa, Israel
Kira Radinsky
Microsoft Research, Cambridge, UK
Katja Hofmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, N., Webber, W., Oard, D.W. (2014). Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-06028-6_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06027-9
Online ISBN: 978-3-319-06028-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics