Advertisement

Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems

  • B. Barla Cambazoglu
  • Aytul Catal
  • Cevdet Aykanat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4263)

Abstract

Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate the effect of these two index partitioning schemes on query processing. We conduct experiments on a 32-node PC cluster, considering the case where index is completely stored in disk. Performance results are reported for a large (30 GB) document collection using an MPI-based parallel query processing implementation.

Keywords

Query Processing Document Collection Query Term Inverted Index Disk Access 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh World-Wide Web Conference, pp. 107–117 (1998)Google Scholar
  2. 2.
    Croft, W.B., Savino, P.: Implementing ranking strategies using text signatures. ACM Transactions on Office Information Systems 6(1), 42–62 (1988)CrossRefGoogle Scholar
  3. 3.
    Zobel, J., Moffat, A., Sacks-Davis, R.: An efficient indexing technique for full-text database systems. In: Proceedings of the 18th International Conference on Very Large Databases, pp. 352–362 (1992)Google Scholar
  4. 4.
    Tomasic, A., Garcia-Molina, H., Shoens, K.: Incremental updates of inverted lists for text document retrieval. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 289–300 (1994)Google Scholar
  5. 5.
    Baeza-Yates, R., Ribeiro-Neto, B.A.: Modern information retrieval. Addison-Wesley, Reading (1999)Google Scholar
  6. 6.
    Tomasic, A., Garcia-Molina, H.: Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In: Proceedings of the International Conference on Parallel and Distributed Information Systems, pp. 8–17 (1992)Google Scholar
  7. 7.
    Jeong, B.S., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)CrossRefGoogle Scholar
  8. 8.
    Ribeiro-Neto, B.A., Barbosa, R.A.: Query performance for tightly coupled distributed digital libraries. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 182–190 (1998)Google Scholar
  9. 9.
    Cambazoglu, B.B., Aykanat, C.: Performance of query processing implementations in ranking-based text retrieval systems using inverted indices. Information Processing and Management 42(4), 875–898 (2005)CrossRefGoogle Scholar
  10. 10.
    Burns, G., Daoud, R., Vaigl, J.: LAM: An Open Cluster Environment for MPI. In: Proceedings of the Supercomputing Symposium, pp. 379–386 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • B. Barla Cambazoglu
    • 1
  • Aytul Catal
    • 2
  • Cevdet Aykanat
    • 1
  1. 1.Department of Computer EngineeringBilkent UniversityBilkent, AnkaraTurkey
  2. 2.Scientific and Technological Research Council of Turkey (TÜBİTAK)Kavaklıdere, AnkaraTurkey

Personalised recommendations