Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval

Shokouhi, Milad

doi:10.1007/978-3-540-71496-5_17

Milad Shokouhi¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4425))

Included in the following conference series:

European Conference on Information Retrieval

2105 Accesses
34 Citations

Abstract

Collection selection is one of the key problems in distributed information retrieval. Due to resource constraints it is not usually feasible to search all collections in response to a query. Therefore, the central component (broker) selects a limited number of collections to be searched for the submitted queries. During the past decade, several collection selection algorithms have been introduced. However, their performance varies on different testbeds. We propose a new collection-selection method based on the ranking of downloaded sample documents. We test our method on six testbeds and show that our technique can significantly outperform other state-of-the-art algorithms in most cases. We also introduce a new testbed based on the trec gov2 documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Avrahami, T., et al.: The FedLemur: federated search in the real world. Journal of the American Society for Information Science and Technology 57(3), 347–358 (2006)
Article Google Scholar
Baillie, M., Azzopardi, L., Crestani, F.: Adaptive query-based sampling of distributed collections. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 316–328. Springer, Heidelberg (2006)
Chapter Google Scholar
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Transactions on Information Systems 19(2), 97–130 (2001)
Article Google Scholar
Callan, J., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 21–28. ACM Press, New York (1995)
Google Scholar
Craswell, N., Bailey, P., Hawking, D.: Server selection on the World Wide Web. In: Proc. ACM Conf. on Digital Libraries, San Antonio, Texas, pp. 37–46. ACM Press, New York (2000)
Chapter Google Scholar
D’Souza, D., Thom, J., Zobel, J.: Collection selection for managed distributed document databases. Information Processing and Management 40(3), 527–546 (2004a)
Article Google Scholar
D’Souza, D., Zobel, J., Thom, J.: Is CORI effective for collection selection? an exploration of parameters, queries, and data. In: Proc. Australian Document Computing Symposium, Melbourne, Australia, pp. 41–46 (2004b)
Google Scholar
Gravano, L., et al.: STARTS: Stanford proposal for Internet meta-searching. In: Proc. ACM SIGMOD Conf., Tucson, Arizona, pp. 207–218. ACM Press, New York (1997)
Google Scholar
Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)
Article Google Scholar
Hawking, D., Thomas, P.: Server selection methods in hybrid portal search. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 75–82. ACM Press, New York (2005)
Google Scholar
Joachims, T., et al.: Accurately interpreting clickthrough data as implicit feedback. In: Proc. ACM SIGIR Conf., Salvador, Brazil, pp. 154–161. ACM Press, New York (2005)
Google Scholar
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. ACM SIGIR Conf., New Orleans, Louisiana, pp. 267–275. ACM Press, New York (2001)
Google Scholar
Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 290–297. ACM Press, New York (2003)
Google Scholar
Powell, A.L., French, J.: Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems 21(4), 412–456 (2003)
Article Google Scholar
Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Italy, pp. 129–138. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Shokouhi, M., Scholer, F., Zobel, J.: Sample sizes for query probing in uncooperative distributed information retrieval. In: Proc. Asia Pacific Web Conf., Harbin, China, pp. 63–75 (2006a)
Google Scholar
Shokouhi, M., et al.: Capturing collection size for distributed non-cooperative retrieval. In: Proc. ACM SIGIR Conf., Seattle, Washington, pp. 316–323. ACM Press, New York (2006b)
Google Scholar
Si, L., Callan, J.: Unified utility maximization framework for resource selection. In: Proc. ACM CIKM Conf., New York, NY, pp. 32–41. ACM Press, New York (2004)
Google Scholar
Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proc. ACM SIGIR Conf., Toronto, Canada, pp. 298–305. ACM Press, New York (2003a)
Google Scholar
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems 21(4), 457–491 (2003b)
Article Google Scholar
Si, L., et al.: A language modeling framework for resource selection and results merging. In: Proc. ACM CIKM Conf., McLean, Virginia, pp. 391–397. ACM Press, New York (2002)
Google Scholar
Xu, J., Croft, B.: Cluster-based language models for distributed retrieval. In: Proc. ACM SIGIR Conf., Berkeley, California, United States, pp. 254–261. ACM Press, New York (1999)
Google Scholar
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the Internet. In: Proc. Conf. on Database Systems for Advanced Applications, Melbourne, Australia, pp. 41–50 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne 3001, Australia
Milad Shokouhi

Authors

Milad Shokouhi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Giambattista Amati Claudio Carpineto Giovanni Romano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shokouhi, M. (2007). Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-71496-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics