Skip to main content

Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8416)

Abstract

Relevance judgments are often the most expensive part of information retrieval evaluation, and techniques for comparing retrieval systems using fewer relevance judgments have received significant attention in recent years. This paper proposes a novel system comparison method using an expectation-maximization algorithm. In the expectation step, real-valued pseudo-judgments are estimated from a set of system results. In the maximization step, new system weights are learned from a combination of a limited number of actual human judgments and system pseudo-judgments for the other documents. The method can work without any human judgments, and is able to improve its accuracy by incrementally adding human judgments. Experiments using TREC Ad Hoc collections demonstrate strong correlations with system rankings using pooled human judgments, and comparison with existing baselines indicates that the new method achieves the same comparison reliability with fewer human judgments.

Keywords

  • Mean Average Precision
  • Human Judgment
  • System Weight
  • Test Collection
  • Selection Policy

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-06028-6_1
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-06028-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   149.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslam, J., Pavlu, V., Savell, R.: A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm. In: Proc. 26th Annual International ACM SIGIR, pp. 393–394 (2003)

    Google Scholar 

  2. Aslam, J., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proc. 29th Annual International ACM SIGIR, pp. 541–548 (2006)

    Google Scholar 

  3. Buckley, C., Voorhees, E.: Retrieval evaluation with incomplete information. In: Proc. 27th Annual International ACM SIGIR, pp. 25–32 (2004)

    Google Scholar 

  4. Carterette, B.: Robust test collections for retrieval evaluation. In: Proc. 30th Annual International ACM SIGIR, pp. 55–62 (2007)

    Google Scholar 

  5. Carterette, B., Allan, J.: Incremental test collections. In: Proc. 14th ACM International Conference on Information and Knowledge Management, pp. 680–687 (2005)

    Google Scholar 

  6. Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proc. 21st Annual International ACM SIGIR, pp. 282–289 (1998)

    Google Scholar 

  7. Dai, K., Pavlu, V., Kanoulas, E., Aslam, J.A.: Extended expectation maximization for inferring score distributions. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 293–304. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  8. Hauff, C., Hiemstra, D., Azzopardi, L., de Jong, F.: A case for automatic system evaluation. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 153–165. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  9. Hosseini, M., Cox, I.J., Milić-Frayling, N., Kazai, G., Vinay, V.: On aggregating labels from multiple crowd workers to infer relevance of documents. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 182–194. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  10. Kendall, M.G.: Rank Correlation Methods, 1st edn. Charles Griffin, London (1948)

    MATH  Google Scholar 

  11. Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Information Processing & Management 42(3), 595–614 (2006)

    CrossRef  MATH  Google Scholar 

  12. Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: Proc. 24th Annual International ACM SIGIR, pp. 66–73 (2001)

    Google Scholar 

  13. Spärck Jones, K., van Rijsbergen, C.J.: Report on the need for and provision of an ‘ideal’ test collection. Tech. rep., University Computer Laboratory, Cambridge (1975)

    Google Scholar 

  14. Yilmaz, E., Aslam, J.: Estimating average precision with incomplete and imperfect judgments. In: Proc. 15th ACM International Conference on Information and Knowledge Management, pp. 102–111 (2006)

    Google Scholar 

  15. Yilmaz, E., Aslam, J., Robertson, S.: A new rank correlation coefficient for information retrieval. In: Proc. 31st Annual International ACM SIGIR, pp. 587–594 (2008a)

    Google Scholar 

  16. Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating ap and ndcg. In: Proc. 31st Annual International ACM SIGIR, pp. 603–610 (2008b)

    Google Scholar 

  17. Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proc. 21st Annual International ACM SIGIR, pp. 307–314 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gao, N., Webber, W., Oard, D.W. (2014). Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization. In: , et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06028-6_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06027-9

  • Online ISBN: 978-3-319-06028-6

  • eBook Packages: Computer ScienceComputer Science (R0)