Advertisement

Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries

  • Simon Jonassen
  • Svein Erik Bratsberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6611)

Abstract

In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations.

Keywords

Query Processing Query Evaluation Query Optimization Inverted Index Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Broder, A., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proc. CIKM. ACM, New York (2003)Google Scholar
  3. 3.
    Buckley, C., Lewit, A.: Optimization of inverted vector searches. In: Proc. SIGIR, pp. 97–110. ACM, New York (1985)Google Scholar
  4. 4.
    Büttcher, S., Clarke, C.: Index compression is good, especially for random access. In: Proc. CIKM, pp. 761–770. ACM, New York (2007)Google Scholar
  5. 5.
    Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)Google Scholar
  6. 6.
    Chierichetti, F., Lattanzi, S., Mari, F., Panconesi, A.: On placing skips optimally in expectation. In: Proc. WSDM, pp. 15–24. ACM, New York (2008)CrossRefGoogle Scholar
  7. 7.
    Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)Google Scholar
  8. 8.
    Ding, S., He, J., Yan, H., Suel, T.: Using graphics processors for high-performance ir query processing. In: Proc. WWW, pp. 1213–1214. ACM, New York (2008)Google Scholar
  9. 9.
    Lacour, P., Macdonald, C., Ounis, I.: Efficiency comparison of document matching techniques. In: Proc. ECIR (2008)Google Scholar
  10. 10.
    Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)CrossRefGoogle Scholar
  12. 12.
    Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)CrossRefGoogle Scholar
  13. 13.
    Strohman, T., Croft, W.: Efficient document retrieval in main memory. In: Proc. SIGIR, pp. 175–182. ACM, New York (2007)Google Scholar
  14. 14.
    Strohman, T., Turtle, H., Croft, W.: Optimization strategies for complex queries. In: Proc. SIGIR, pp. 219–225. ACM, New York (2005)Google Scholar
  15. 15.
    Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995)CrossRefGoogle Scholar
  16. 16.
    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. WWW, pp. 401–410. ACM, New York (2009)Google Scholar
  17. 17.
    Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. WWW, pp. 387–396. ACM, New York (2008)Google Scholar
  18. 18.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)Google Scholar
  19. 19.
    Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar ram-cpu cache compression. In: Proc. ICDE, p. 59. IEEE Computer Society, Los Alamitos (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Simon Jonassen
    • 1
  • Svein Erik Bratsberg
    • 1
  1. 1.Department of Computer and Information ScienceNorwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations