Distributed Information Retrieval and Applications

  • Fabio Crestani
  • Ilya Markov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)

Abstract

Distributed Information Retrieval (DIR) is a generic area of research that brings together techniques, such as resource selection and results aggregation, dealing with data that, for organizational or technical reasons, cannot be managed centrally. Existing and potential applications of DIR methods vary from blog retrieval to aggregated search and from multimedia and multilingual retrieval to distributed Web search. In this tutorial we briefly discuss main DIR phases, that are resource description, resource selection, results merging and results presentation. The main focus is made on applications of DIR techniques: blog, expert and desktop search, aggregated search and personal meta-search, multimedia and multilingual retrieval. We also discuss a number of potential applications of DIR techniques, such as distributed Web search, enterprise search and aggregated mobile search.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of CIKM, pp. 1277–1286. ACM (2009)Google Scholar
  2. 2.
    Arguello, J., Diaz, F., Callan, J.: Learning to aggregate vertical results into web search results. In: Proceedings of CIKM, pp. 201–210 (2011)Google Scholar
  3. 3.
    Arguello, J., Diaz, F., Callan, J., Crespo, J.F.: Sources of evidence for vertical selection. In: Proceedings of SIGIR, pp. 315–322 (2009)Google Scholar
  4. 4.
    Baeza-Yates, R., Murdock, V., Hauff, C.: Efficiency trade-offs in two-tier web search systems. In: Proceedings of SIGIR, pp. 163–170 (2009)Google Scholar
  5. 5.
    Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28 (1995)Google Scholar
  6. 6.
    Callan, J.: Advances in Information Retrieval. In: Distributed Information Retrieval, vol. ch. 5, pp. 127–150. Kluwer Academic Publishers (2000)Google Scholar
  7. 7.
    Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions of Information Systems 19(2), 97–130 (2001)CrossRefGoogle Scholar
  8. 8.
    Callan, J., Crestani, F., Nottelmann, H., Pala, P., Shou, X.M.: Resource selection and data fusion in multimedia distributed digital libraries. In: Proceedings of SIGIR, pp. 363–364 (2003)Google Scholar
  9. 9.
    Cambazoglu, B.B., Plachouras, V., Baeza-Yates, R.: Quantifying performance and quality gains in distributed web search engines. In: Proceedings of SIGIR, pp. 411–418 (2009)Google Scholar
  10. 10.
    Cambazoglu, B.B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R.: Query forwarding in geographically distributed search engines. In: Proceedings of SIGIR, pp. 90–97 (2010)Google Scholar
  11. 11.
    Elsas, J.L., Arguello, J., Callan, J., Carbonell, J.G.: Retrieval and feedback models for blog feed search. In: Proceedings of SIGIR, pp. 347–354 (2008)Google Scholar
  12. 12.
    Hong, D., Si, L., Bracke, P., Witt, M., Juchcinski, T.: A joint probabilistic classification model for resource selection. In: Proceedings of SIGIR, pp. 98–105 (2010)Google Scholar
  13. 13.
    Kim, J., Croft, W.B.: Ranking using multiple document types in desktop search. In: Proceedings of SIGIR. pp. 50–57 (2010)Google Scholar
  14. 14.
    Kulkarni, A., Callan, J.: Document allocation policies for selective searching of distributed indexes. In: Proceedings of CIKM, pp. 449–458 (2010)Google Scholar
  15. 15.
    Markov, I.: Modeling document scores for distributed information retrieval. In: Proceedings of SIGIR, pp. 1321–1322 (2011)Google Scholar
  16. 16.
    Markov, I., Arampatzis, A., Crestani, F.: Unsupervised linear score normalization revisited. In: Proceedings of SIGIR, pp. 1161–1162 (2012)Google Scholar
  17. 17.
    Markov, I., Arampatzis, A., Crestani, F.: On CORI results merging. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Agichtein, S.R.E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 753–756. Springer, Heidelberg (2013)Google Scholar
  18. 18.
    Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Agichtein, S.R.E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013)Google Scholar
  19. 19.
    Nguyen, D., Demeester, T., Trieschnigg, D., Hiemstra, D.: Federated search in the wild: the combined power of over a hundred search engines. In: Proceedings of CIKM, pp. 1874–1878 (2012)Google Scholar
  20. 20.
    Paltoglou, G., Salampasis, M., Satratzemi, M.: Integral based source selection for uncooperative distributed information retrieval environments. In: Proceedings of the ACM LSDS-IR Workshop, pp. 67–74 (2008)Google Scholar
  21. 21.
    Seo, J., Croft, W.B.: Blog site search using resource selection. In: Proceedings of CIKM, pp. 1053–1062 (2008)Google Scholar
  22. 22.
    Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  23. 23.
    Shokouhi, M., Si, L.: Federated search. Foundations and Trends in Information Retrieval 5, 1–102 (2011)CrossRefGoogle Scholar
  24. 24.
    Shokouhi, M., Zobel, J.: Robust result merging using sample-based score estimates. ACM Transactions of Information Systems 27(3), 1–29 (2009)CrossRefGoogle Scholar
  25. 25.
    Si, L., Callan, J.: Using sampled data and regression to merge search engine results. In: Proceedings of SIGIR, pp. 19–26 (2002)Google Scholar
  26. 26.
    Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of SIGIR, pp. 298–305 (2003)Google Scholar
  27. 27.
    Si, L., Callan, J., Cetintas, S., Yuan, H.: An effective and efficient results merging strategy for multilingual information retrieval in federated search environments. Information Retrieval 11(1), 1–24 (2008)CrossRefGoogle Scholar
  28. 28.
    Sushmita, S., Joho, H., Lalmas, M., Villa, R.: Factors affecting click-through behavior in aggregated search interfaces. In: Proceedings of CIKM, pp. 519–528 (2010)Google Scholar
  29. 29.
    Thomas, P.: To what problem is distributed information retrieval the solution? Journal of the American Society for Information Science and Technology 63(7), 1471–1476 (2012)CrossRefGoogle Scholar
  30. 30.
    Thomas, P., Hawking, D.: Server selection methods in personal metasearch: a comparative empirical study. Information Retrieval 12(5), 581–604 (2009)CrossRefGoogle Scholar
  31. 31.
    Thomas, P., Noack, K., Paris, C.: Evaluating interfaces for government metasearch. In: Proceedings of IIiX, pp. 65–74 (2010)Google Scholar
  32. 32.
    Thomas, P., Shokouhi, M.: Sushi: scoring scaled samples for server selection. In: Proceedings of SIGIR, pp. 419–426 (2009)Google Scholar
  33. 33.
    Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of SIGIR, pp. 254–261 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Fabio Crestani
    • 1
  • Ilya Markov
    • 1
  1. 1.University of LuganoLuganoSwitzerland

Personalised recommendations