Algorithms for Within-Cluster Searches Using Inverted Files

  • Ismail Sengor Altingovde
  • Fazli Can
  • Özgür Ulusoy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4263)


Information retrieval over clustered document collections has two successive stages: first identifying the best-clusters and then the best-documents in these clusters that are most similar to the user query. In this paper, we assume that an inverted file over the entire document collection is used for the latter stage. We propose and evaluate algorithms for within-cluster searches, i.e., to integrate the best-clusters with the best-documents to obtain the final output including the highest ranked documents only from the best-clusters. Our experiments on a TREC collection including 210,158 documents with several query sets show that an appropriately selected integration algorithm based on the query length and system resources can significantly improve the query evaluation efficiency.


Query Processing Query Term Information Retrieval System Query Evaluation Inverted Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altingovde, I.S., Can, F., Demir, E., Ulusoy, O.: Incremental cluster-based retrieval with embedded centroids using compressed cluster-skipping inverted files (submitted for publication)Google Scholar
  2. 2.
    Cacheda, F., Baeza-Yates, R.: An Optimistic Model for Searching Web Directories. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 364–377. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Cacheda, F., Carneiro, V., Guerrero, C., Viña, Á.: Optimization of restricted searches in Web directories using hybrid data structures. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 436–451. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Can, F.: On the efficiency of best-match cluster searches. Information Processing and Management 30(3), 343–361 (1994)CrossRefGoogle Scholar
  5. 5.
    Can, F., Altingovde, I.S., Demir, E.: Efficiency and effectiveness of query processing in cluster-based retrieval. Information Systems 29(8), 697–717 (2004)CrossRefGoogle Scholar
  6. 6.
    Can, F., Ozkarahan, E.A.: Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM TODS 15(4), 483–517 (1990)CrossRefGoogle Scholar
  7. 7.
    Cambazoglu, B.B., Aykanat, C.: Performance of query processing implementations in ranking-based text retrieval systems using inverted indices. Information Processing and Management 42(4), 875–898 (2006)CrossRefGoogle Scholar
  8. 8.
    van Rijsbergen, C.J.: Information retrieval, 2nd edn. Butterworths, London (1979)Google Scholar
  9. 9.
    Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison Wesley, Reading (1989)Google Scholar
  10. 10.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes compressing and indexing documents and images. Van Nostrand Reinhold, New York (1994)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ismail Sengor Altingovde
    • 1
  • Fazli Can
    • 1
  • Özgür Ulusoy
    • 1
  1. 1.Department of Computer EngineeringBilkent UniversityTurkey

Personalised recommendations