Algorithms for Within-Cluster Searches Using Inverted Files
Information retrieval over clustered document collections has two successive stages: first identifying the best-clusters and then the best-documents in these clusters that are most similar to the user query. In this paper, we assume that an inverted file over the entire document collection is used for the latter stage. We propose and evaluate algorithms for within-cluster searches, i.e., to integrate the best-clusters with the best-documents to obtain the final output including the highest ranked documents only from the best-clusters. Our experiments on a TREC collection including 210,158 documents with several query sets show that an appropriately selected integration algorithm based on the query length and system resources can significantly improve the query evaluation efficiency.
KeywordsQuery Processing Query Term Information Retrieval System Query Evaluation Inverted Index
Unable to display preview. Download preview PDF.
- 1.Altingovde, I.S., Can, F., Demir, E., Ulusoy, O.: Incremental cluster-based retrieval with embedded centroids using compressed cluster-skipping inverted files (submitted for publication)Google Scholar
- 8.van Rijsbergen, C.J.: Information retrieval, 2nd edn. Butterworths, London (1979)Google Scholar
- 9.Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison Wesley, Reading (1989)Google Scholar