Adaptive Query-Based Sampling of Distributed Collections
As part of a Distributed Information Retrieval system a description of each remote information resource, archive or repository is usually stored centrally in order to facilitate resource selection. The acquisition of precise resource descriptions is therefore an important phase in Distributed Information Retrieval, as the quality of such representations will impact on selection accuracy, and ultimately retrieval performance. While Query-Based Sampling is currently used for content discovery of uncooperative resources, the application of this technique is dependent upon heuristic guidelines to determine when a sufficiently accurate representation of each remote resource has been obtained. In this paper we address this shortcoming by using the Predictive Likelihood to provide both an indication of the quality of an acquired resource description estimate, and when a sufficiently good representation of a resource has been obtained during Query-Based Sampling.
KeywordsLanguage Model Resource Selection Selection Accuracy Comparable Indication Remote Resource
Unable to display preview. Download preview PDF.
- 1.Azzopardi, L., Girolami, M., Risjbergen, C.J.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: Proceedings of the 26th ACM SIGIR conference, pp. 369–370 (2003)Google Scholar
- 3.Baillie, M., Azzopardi, L., Crestani, F.: Towards better measures: Evaluation of estimated resource description quality for distributed IR. In: First International Conference on Scalable Information Systems. IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
- 5.Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: Proceedings of the 23rd ACM SIGIR conference, pp. 33–40 (2000)Google Scholar
- 6.Callan, J.P.: Advances in information retrieval. In: chapter Distributed information retrieval, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
- 9.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Chichester (2000)Google Scholar
- 11.Ipeirotis, P.G., Gravano, L.: When one sample is not enough: improving text database selection using shrinkage. In: Proceedings of the ACM SIGMOD Conference, pp. 767–778 (2004)Google Scholar
- 12.Kullback, S.: Information theoery and statistics. Wiley, New York (1959)Google Scholar
- 14.Si, L., Callan, J.P.: Modeling search engine effectiveness for federated search. In: Proceedings of the 28th ACM SIGIR Conference, pp. 83–90 (2005)Google Scholar
- 15.Xu, J., Croft, W.B.: Cluster-based language models for distributed retrieval. In: Proceedings of the 22nd ACM SIGIR conference, pp. 254–261 (1999)Google Scholar