Information Retrieval Journal

, Volume 19, Issue 4, pp 351–377 | Cite as

Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments

  • Charles L. A. Clarke
  • J. Shane Culpepper
  • Alistair Moffat
Article

Abstract

Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.

Keywords

Search Efficiency Effectiveness Experimentation Measurement 

References

  1. Al-akashi, F. H., & Inkpen, D. (2011). Query-structure based web page indexing. In Proceedings of the text retrieval conference.Google Scholar
  2. Asadi, N., & Lin, J. (2013a). Document vector representations for feature extraction in multi-stage document ranking. Information Retrieval Journal, 16(6), 747–768.CrossRefGoogle Scholar
  3. Asadi, N., & Lin, J. (2013b). Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 997–1000).Google Scholar
  4. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the international conference on the world wide web (pp. 107–117).Google Scholar
  5. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. Y. (2003). Efficient query evaluation using a two-level retrieval process. In: Proceedings of the conference on information and knowledge management (pp. 426–434).Google Scholar
  6. Büttcher, S., Clarke, C. L. A., & Cormack, G. V. (2010). Information retrieval: Implementing and evaluating search engines. Cambridge, MA: MIT Press.MATHGoogle Scholar
  7. Chapelle, O., Metzler, D., Zhang, Y., & Grinspan, P. (2009). Expected reciprocal rank for graded relevance. In Proceedigns of the conference on information and knowledge management (pp. 621–630).Google Scholar
  8. Cormack, G. V., Smucker, M. D., & Clarke, C. L. A. (2011). Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5), 441–465.CrossRefGoogle Scholar
  9. Culpepper, J. S., & Moffat, A. (2010). Compact set intersection for inverted indexing. ACM Transactions on Information Systems, 29(1), 1:1–1:25.CrossRefGoogle Scholar
  10. Demaine, E. D., López-Ortiz, A., & Munro, J. I. (2000). Adaptive set intersections, unions, and differences. In Proceedings of the ACM-SIAM symposium on discrete algorithms (pp. 743–752).Google Scholar
  11. Demaine, E. D., López-Ortiz, A., & Munro, J. I. (2001). Experiments on adaptive set intersections for text retrieval systems. In Proceedings of the workshop on algorithm engineering and experiments (pp. 91–104).Google Scholar
  12. Eisayed, T., Asadi, N., Wang, L., Lin, J., & Metzler, D. (2011). UMD and USC/ISI: TREC 2010 Web Track experiments with Ivory. In Proceedings of the text retrieval conference.Google Scholar
  13. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  14. Kane, A., & Tompa, F. W. (2014). Skewed partial bitvectors for list intersection. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 263–272).Google Scholar
  15. Lemire, D., & Boytsov, L. (2015). Decoding billions of integers per second through vectorization. Software Practice and Experience, 45(1), 1–29.CrossRefGoogle Scholar
  16. Macdonald, C., Santos, R. L. T., & Ounis, I. (2013a). The whens and hows of learning to rank for web search. Information Retrieval Journal, 16(5), 584–628.CrossRefGoogle Scholar
  17. Macdonald, C., Santos, R. L. T., Ounis, I., & He, B. (2013b). About learning models with multiple query-dependent features. ACM Transactions on Information Systems, 31(3), 11.CrossRefGoogle Scholar
  18. Moffat, A., & Zobel, J. (2008). Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27(1), 2.1–2.27.CrossRefGoogle Scholar
  19. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical Report 1999–66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/.
  20. Richardson, M., Prakash, A., & Brill, E. (2006). Beyond PageRank: Machine learning for static ranking. In Proceedings of the international conference on the world wide web (pp. 707–715).Google Scholar
  21. Sakai, T. (2007). Alternatives to bpref. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 71–78).Google Scholar
  22. Sakai, T., & Kando, N. (2008). On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval Journal, 11(5), 447–470.CrossRefGoogle Scholar
  23. Smucker, M. D., & Clarke, C. L. A. (2012). Time-based calibration of effectiveness measures. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 95–104).Google Scholar
  24. Strohman, T., Turtle, H., & Croft, W. B. (2005). Optimization strategies for complex queries. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 219–225).Google Scholar
  25. Tan, L., & Clarke, C. L. A. (2015). A family of rank similarity measures based on maximized effectiveness difference. IEEE Transactions on Knowledge and Data Engineering, 27(11), 2865–2877.CrossRefGoogle Scholar
  26. Trotman, A., Jia, X.-F., & Crane, M. (2012). Towards an efficient and effective search engine. In Workshop on open source IR (pp. 40–47).Google Scholar
  27. Turtle, H. R., & Flood, J. (1995). Query evaluation: Strategies and optimizations. Information Processing & Management, 31(6), 831–850.CrossRefGoogle Scholar
  28. Wang, L., Lin, J., & Metzler, D. (2011). A cascade ranking model for efficient ranked retrieval. In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 105–114).Google Scholar
  29. Webber, W., Moffat, A., & Zobel, J. (2010). A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4), 20.1–20.38.CrossRefGoogle Scholar
  30. Wu, H., & Fang, H. (2014). Document prioritization for scalable query processing. In Proceedings of the conference on information and knowledge management (pp. 1609–1618).Google Scholar
  31. Zobel, J. (1998). How reliable are the results of large-scale information retrieval experiments? In Proceedings of the ACM-SIGIR international conference on research and development in information retrieval (pp. 307–314).Google Scholar
  32. Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM Computing Surveys, 38(2), 6.1–6.56.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.University of WaterlooWaterlooCanada
  2. 2.RMIT UniversityMelbourneAustralia
  3. 3.The University of MelbourneMelbourneAustralia

Personalised recommendations