Advertisement

Hybrid Partition Inverted Files: Experimental Validation

  • Wensi Xi
  • Ohm Sornil
  • Ming Luo
  • Edward A. Fox
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2458)

Abstract

The rapid increase in content available in digital forms gives rise to large digital libraries, targeted to support millions of users and terabytes of data. Efficiently retrieving information then is a challenging task due to the size of the collection and its index. In this paper, our high performance “hybrid” partition inverted index is validated through experiments with a 100 Gbyte collection from TREC-9 and -10. The hybrid scheme combines the term and the document approaches to partitioning inverted indices across nodes of a parallel system. Experiments on a parallel system show that this organization outperforms the document and the term partitioning schemes. Our hybrid approach should support highly efficient searching for information in a largescale digital library, implemented atop a network of computers.

Keywords

Digital Library Query Term Information Retrieval System Inverted Index Chunk Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Badue, R. Baeza-Yates, B. Ribeiro-Neto and N. Ziviani. Distributed Query Processing Using Partitioned Inverted Files. In Proceedings of SPIRE 2001, IEEE CS Press, Laguna San Rafael, Chile, pp. 10–20, November 2001.Google Scholar
  2. 2.
    E. W. Brown. Parallel and Distributed IR, Chapter 9 in Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, eds, ACM Press / Addison Wesley-Longman England, pp. 229–256, 1999.Google Scholar
  3. 3.
    C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Office Information Systems, 2(4):267–288, October 1984.CrossRefGoogle Scholar
  4. 4.
    D. Harman, E. Fox, R. Baeza-Yates, and W. Lee. Inverted Files, Chapter 2.1 In Information Retrieval: Data Structures & Algorithms, editors W. Frakes & R. Baeza-Yates, Prentice-Hall, pp. 28–43, 1992.Google Scholar
  5. 5.
    B.-S. Jeong and E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems, 6(2): 142–153, 1995.CrossRefGoogle Scholar
  6. 6.
    J. R. Jump. YACSIM: Reference Manual. Rice University, version 2.1 edition, March 1993.Google Scholar
  7. 7.
    B. A. Ribeiro-Neto and R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM Conference on Digita Libraries, pp. 182–190, 1998.Google Scholar
  8. 8.
    B. A. Ribeiro-Neto, E. S. Moura, M. S. Neubert, and N. Ziviani. Efficient distributed algorithms to build inverted files. In Proceedings of ACM SIGIR’99, pp. 105–112, 1999.Google Scholar
  9. 9.
    Ohm Sornil. Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries. Ph. D. Dissertation, Virginia Tech Dept. of Computer Science, 2001.Google Scholar
  10. 10.
    Ohm Sornil and Edward A. Fox. Hybrid Partitioned Inverted Indices for Large-Scale Digital Libraries. In Proceedings of the 4th International Conference on Asian Digital Libraries, ICADL’2001, Bangalore, India, Dec. 10–12, 2001Google Scholar
  11. 11.
    A. S. Tomasic. Distributed Queries and Incremental Updates in Information Retrieval Systems. Ph.D. thesis, Princeton University, June 1994.Google Scholar
  12. 12.
    A. S. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed sharednothing information retrieval systems. In Proceedings of SIGMOD’93, Washington, D.C., May 1993.Google Scholar
  13. 13.
    A. S. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of PDIS’93, 1993.Google Scholar
  14. 14.
    E. M. Voorhees and D. K. Harman. NIST special publication: The 9th Text REtrieval Conference (TREC-9), November 2000.Google Scholar
  15. 15.
    E. M. Voorhees and D. K. Harman. NIST special publication: The 10th Text REtrieval Conference (TREC-10), November 2001.Google Scholar
  16. 16.
    J. Zobel, A. Moat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, December 1998.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Wensi Xi
    • 1
  • Ohm Sornil
    • 2
  • Ming Luo
    • 1
  • Edward A. Fox
    • 1
  1. 1.Computer Science DepartmentVirginia Polytechnic Institute and State UniversityUSA
  2. 2.Information Systems Education CenterThe National Institute of Development AdministrationBangkokThailand

Personalised recommendations