Skip to main content

Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

Abstract

Collection selection is one of the key problems in distributed information retrieval. Due to resource constraints it is not usually feasible to search all collections in response to a query. Therefore, the central component (broker) selects a limited number of collections to be searched for the submitted queries. During the past decade, several collection selection algorithms have been introduced. However, their performance varies on different testbeds. We propose a new collection-selection method based on the ranking of downloaded sample documents. We test our method on six testbeds and show that our technique can significantly outperform other state-of-the-art algorithms in most cases. We also introduce a new testbed based on the trecĀ gov2 documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Avrahami, T., et al.: The FedLemur: federated search in the real world. Journal of the American Society for Information Science and TechnologyĀ 57(3), 347ā€“358 (2006)

    ArticleĀ  Google ScholarĀ 

  • Baillie, M., Azzopardi, L., Crestani, F.: Adaptive query-based sampling of distributed collections. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol.Ā 4209, pp. 316ā€“328. Springer, Heidelberg (2006)

    ChapterĀ  Google ScholarĀ 

  • Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information SystemsĀ 19(2), 97ā€“130 (2001)

    ArticleĀ  Google ScholarĀ 

  • Callan, J., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 21ā€“28. ACM Press, New York (1995)

    Google ScholarĀ 

  • Craswell, N., Bailey, P., Hawking, D.: Server selection on the World Wide Web. In: Proc. ACM Conf. on Digital Libraries, San Antonio, Texas, pp. 37ā€“46. ACM Press, New York (2000)

    ChapterĀ  Google ScholarĀ 

  • Dā€™Souza, D., Thom, J., Zobel, J.: Collection selection for managed distributed document databases. Information Processing and ManagementĀ 40(3), 527ā€“546 (2004a)

    ArticleĀ  Google ScholarĀ 

  • Dā€™Souza, D., Zobel, J., Thom, J.: Is CORI effective for collection selection? an exploration of parameters, queries, and data. In: Proc. Australian Document Computing Symposium, Melbourne, Australia, pp. 41ā€“46 (2004b)

    Google ScholarĀ 

  • Gravano, L., et al.: STARTS: Stanford proposal for Internet meta-searching. In: Proc. ACM SIGMOD Conf., Tucson, Arizona, pp. 207ā€“218. ACM Press, New York (1997)

    Google ScholarĀ 

  • Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database SystemsĀ 24(2), 229ā€“264 (1999)

    ArticleĀ  Google ScholarĀ 

  • Hawking, D., Thomas, P.: Server selection methods in hybrid portal search. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 75ā€“82. ACM Press, New York (2005)

    Google ScholarĀ 

  • Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 154ā€“161. ACM Press, New York (2005)

    Google ScholarĀ 

  • Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR Conf., New Orleans, Louisiana, pp. 267ā€“275. ACM Press, New York (2001)

    Google ScholarĀ 

  • Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 290ā€“297. ACM Press, New York (2003)

    Google ScholarĀ 

  • Powell, A.L., French, J.: Comparing the performance of collection selection algorithms. ACM Transactions on Information SystemsĀ 21(4), 412ā€“456 (2003)

    ArticleĀ  Google ScholarĀ 

  • Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Italy, pp. 129ā€“138. Morgan Kaufmann, San Francisco (2001)

    Google ScholarĀ 

  • Shokouhi, M., Scholer, F., Zobel, J.: Sample sizes for query probing in uncooperative distributed information retrieval. In: Proc. Asia Pacific Web Conf., Harbin, China, pp. 63ā€“75 (2006a)

    Google ScholarĀ 

  • Shokouhi, M., et al.: Capturing collection size for distributed non-cooperative retrieval. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 316ā€“323. ACM Press, New York (2006b)

    Google ScholarĀ 

  • Si, L., Callan, J.: Unified utility maximization framework for resource selection. In: Proc. ACM CIKM Conf., New York, NY, pp. 32ā€“41. ACM Press, New York (2004)

    Google ScholarĀ 

  • Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 298ā€“305. ACM Press, New York (2003a)

    Google ScholarĀ 

  • Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information SystemsĀ 21(4), 457ā€“491 (2003b)

    ArticleĀ  Google ScholarĀ 

  • Si, L., et al.: A language modeling framework for resource selection and results merging. In: Proc. ACM CIKM Conf., McLean, Virginia, pp. 391ā€“397. ACM Press, New York (2002)

    Google ScholarĀ 

  • Xu, J., Croft, B.: Cluster-based language models for distributed retrieval. In: Proc. ACM SIGIR Conf., Berkeley, California, United States, pp. 254ā€“261. ACM Press, New York (1999)

    Google ScholarĀ 

  • Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the Internet. In: Proc. Conf. on Database Systems for Advanced Applications, Melbourne, Australia, pp. 41ā€“50 (1997)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Shokouhi, M. (2007). Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71496-5_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71494-1

  • Online ISBN: 978-3-540-71496-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics