Reducing the Uncertainty in Resource Selection

  • Ilya Markov
  • Leif Azzopardi
  • Fabio Crestani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)


The distributed retrieval process is plagued by uncertainty. Sampling, selection, merging and ranking are all based on very limited information compared to centralized retrieval. In this paper, we focus our attention on reducing the uncertainty within the resource selection phase by obtaining a number of estimates, rather than relying upon only one point estimate. We propose three methods for reducing uncertainty which are compared against state-of-the-art baselines across three distributed retrieval testbeds. Our results show that the proposed methods significantly improve baselines, reduce the uncertainty and improve robustness of resource selection.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of the ACM CIKM, pp. 1277–1286 (2009)Google Scholar
  2. 2.
    Arguello, J., Diaz, F., Callan, J., Crespo, J.F.: Sources of evidence for vertical selection. In: Proceedings of the ACM SIGIR, pp. 315–322 (2009)Google Scholar
  3. 3.
    Azzopardi, L., Baillie, M., Crestani, F.: Adaptive query-based sampling for distributed ir. In: Proceedings of the ACM SIGIR, pp. 605–606 (2006)Google Scholar
  4. 4.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of the ACM SIGIR, pp. 21–28 (1995)Google Scholar
  5. 5.
    Callan, J.: Distributed Information Retrieval. In: Advances in Information Retrieval, ch. 5, pp. 127–150. Kluwer Academic Publishers (2000)Google Scholar
  6. 6.
    Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions of Information Systems 19(2), 97–130 (2001)CrossRefGoogle Scholar
  7. 7.
    Callan, J., Crestani, F., Nottelmann, H., Pala, P., Shou, X.M.: Resource selection and data fusion in multimedia distributed digital libraries. In: Proceedings of the ACM SIGIR, pp. 363–364 (2003)Google Scholar
  8. 8.
    Caverlee, J., Liu, L., Bae, J.: Distributed query sampling: a quality-conscious approach. In: Proceedings of the ACM SIGIR, pp. 340–347 (2006)Google Scholar
  9. 9.
    Collins-Thompson, K., Callan, J.: Estimation and use of uncertainty in pseudo-relevance feedback. In: Proceedings of the ACM SIGIR, pp. 303–310 (2007)Google Scholar
  10. 10.
    Crestani, F., Lalmas, M.: Logic and Uncertainty in Information Retrieval. In: Agosti, M., Crestani, F., Pasi, G. (eds.) ESSIR 2000. LNCS, vol. 1980, pp. 179–206. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  11. 11.
    Hauff, C.: Predicting the effectiveness of queries and retrieval systems. SIGIR Forum 44(1), 88–88 (2010)CrossRefGoogle Scholar
  12. 12.
    Markov, I., Arampatzis, A., Crestani, F.: Improving cori for results merging and score normalization. In: Proceedings of ECIR (2013)Google Scholar
  13. 13.
    Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)CrossRefGoogle Scholar
  15. 15.
    Shokouhi, M., Zobel, J.: Robust result merging using sample-based score estimates. ACM Trans. Inf. Syst. 27(3), 1–29 (2009)CrossRefGoogle Scholar
  16. 16.
    Shokouhi, M., Zobel, J., Tahaghoghi, S.M.M., Scholer, F.: Using query logs to establish vocabularies in distributed information retrieval. Information Processing & Management 43(1), 169–180 (2007)CrossRefGoogle Scholar
  17. 17.
    Si, L., Callan, J.: Using sampled data and regression to merge search engine results. In: Proceedings of the ACM SIGIR, pp. 19–26 (2002)Google Scholar
  18. 18.
    Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of the ACM SIGIR, pp. 298–305 (2003)Google Scholar
  19. 19.
    Thomas, P., Shokouhi, M.: Sushi: scoring scaled samples for server selection. In: Proceedings of the ACM SIGIR, pp. 419–426 (2009)Google Scholar
  20. 20.
    Thomas, P., Shokouhi, M.: Evaluating Server Selection for Federated Search. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 607–610. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Wang, J., Zhu, J.: Portfolio theory of information retrieval. In: Proceeding of the ACM SIGIR, pp. 115–122 (2009)Google Scholar
  22. 22.
    Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of the ACM SIGIR, pp. 254–261. ACM (1999)Google Scholar
  23. 23.
    Zhai, C., Lafferty, J.D.: A risk minimization framework for information retrieval. Information Processing & Management 42(1), 31–55 (2006)MATHCrossRefGoogle Scholar
  24. 24.
    Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the ACM SIGIR, pp. 543–550 (2007)Google Scholar
  25. 25.
    Zhu, J., Wang, J., Taylor, M., Cox, I.J.: Risk-Aware Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 17–28. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ilya Markov
    • 1
  • Leif Azzopardi
    • 2
  • Fabio Crestani
    • 1
  1. 1.University of LuganoLuganoSwitzerland
  2. 2.University of GlasgowGlasgowUK

Personalised recommendations