Fast Block-Compressed Inverted Lists

  • Giovanni M. Sacco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7446)

Abstract

New techniques for compressing and storing inverted lists are presented. Differently from previous research, these techniques are especially designed for volatile inverted lists and combine different types of compression (including prefix compression) with block segmentation to allow easy insertion/deletion of pointers and, most importantly, to significantly reduce execution times while keeping storage requirements close to a baseline monolithic inverted list implementation based on Elias’s ( codes. Inverted lists for information retrieval are addressed and experiments are reported. The best method uses an optimized block-oriented evaluation that is able to efficiently skip irrelevant pointers and that has an observed average execution time which is less than 65% of the baseline implementation.

Keywords

Inverted Index Single Record Storage Overhead Volatile Index Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
  2. 2.
    Baeza-Yates, R.A.: A Fast Set Intersection Algorithm for Sorted Sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 400–408. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Bayer, R., Unterauer, K.: Prefix B-trees. ACM Trans. Database Syst. 2(1), 11–26 (1977)CrossRefGoogle Scholar
  4. 4.
    Comer, D.: The Ubiquitous B-Tree. ACM Comput. Surv. 11(2), 121–137 (1979)MATHCrossRefGoogle Scholar
  5. 5.
    Culpepper, J.S., Moffat, A.: Efficient set intersection for inverted indexing. ACM Trans. Inf. 29(1) (2010)Google Scholar
  6. 6.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. on Information Theory IT-21(2), 194–203 (1975)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Golomb, S.W.: Run-length encodings. IEEE Trans. Info Theory 12(3), 399–401 (1966)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Navarro, G., de Moura, S.E., Neubert, M., Ziviani, N., Baeza-Yates, R.: Adding Compression to Block Addressing Inverted Indexes. Information Retrieval 3(1), 49–77 (2000)CrossRefGoogle Scholar
  9. 9.
    Sacco, G.M.: Dynamic Taxonomies: A Model for Large Information Bases. IEEE Trans. on Knowl. and Data Eng. 12(3), 468–479 (2000)CrossRefGoogle Scholar
  10. 10.
    Sacco, G.M., Tzitzikas, Y. (eds.): Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience. The Information Retrieval Series, vol. 25. Springer (2009)Google Scholar
  11. 11.
    Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. ACM SIGIR Conf. (SIGIR 2002), pp. 222–229 (2002)Google Scholar
  12. 12.
    Wagner, R.: Indexing design considerations. IBM Syst. J., 351-367 (1973)Google Scholar
  13. 13.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  14. 14.
    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. Conf. on World Wide Web (WWW 2009), pp. 401–410 (2009)Google Scholar
  15. 15.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comp. Surv. 38(2) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Giovanni M. Sacco
    • 1
  1. 1.Dipartimento di InformaticaUniversità di TorinoTorinoItaly

Personalised recommendations