Structured Index Organizations for High-Throughput Text Querying

  • Vo Ngoc Anh
  • Alistair Moffat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4209)

Abstract

Inverted indexes are the preferred mechanism for supporting content-based queries in text retrieval systems, with the various data items usually stored compressed in some way. But different query modalities require that different information be held in the index. For example, phrase querying requires that word offsets be held as well as document numbers. In this study we describe an inverted index organization that provides efficient support for all of conjunctive Boolean queries, ranked queries, and phrase queries. Experimental results on a 426 GB document collection show that the methods we describe provide fast evaluation of all three querying modes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 2001, pp. 35–42. ACM Press, New York (2001)CrossRefGoogle Scholar
  2. Anh, V.N., Moffat, A.: Improved word-aligned binary compression for text indexing. IEEE Transactions on Knowledge and Data Engineering 18(6), 857–861 (2006a)CrossRefGoogle Scholar
  3. Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, August 2006, ACM Press, New York (to appear, 2006b)Google Scholar
  4. Hawking, D.: Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum 32(2), 16–22 (1998)CrossRefMathSciNetGoogle Scholar
  5. Heaps, H.S.: Information Retrieval, Computational and Theoretical Aspects. Academic Press, London (1978)MATHGoogle Scholar
  6. Kaszkiel, M., Zobel, J., Sacks-Davis, R.: Efficient passage ranking for document databases. ACM Transactions on Information Systems 17(4), 406–439 (1999)CrossRefGoogle Scholar
  7. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems 14(4), 349–379 (1996)CrossRefGoogle Scholar
  8. Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10), 749–764 (1996)CrossRefGoogle Scholar
  9. Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: Marchionini, G., Moffat, A., Tait, J., Baeza-Yates, R., Ziviani, N. (eds.) Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 2005, pp. 219–225. ACM Press, New York (2005)CrossRefGoogle Scholar
  10. Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Information Processing & Management 31(1), 831–850 (1995)CrossRefGoogle Scholar
  11. Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Transactions on Information Systems 22(4), 573–594 (2004)CrossRefGoogle Scholar
  12. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)Google Scholar
  13. Zobel, J., Moffat, A.: Inverted files for text search engines. Computing Surveys, (to appear, 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Vo Ngoc Anh
    • 1
  • Alistair Moffat
    • 1
  1. 1.Department of Computer Science and Software EngineeringThe University of MelbourneAustralia

Personalised recommendations