Skip to main content

On Optimizing Partitioning Strategies for Faster Inverted Index Compression

  • Conference paper
  • First Online:
Computational Science and Its Applications -- ICCSA 2016 (ICCSA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9789))

Included in the following conference series:

  • 1692 Accesses

Abstract

Inverted index is a key component for search engine to manage billions of documents and fast respond to users’ queries. While substantial effort has been made to compromise space occupancy and decoding speed, what has been overlooked is the encoding speed when constructing the index. VSEncoding is a powerful encoder that works by optimally partitioning a list of integers into blocks which are efficiently compressed by using simple encoders, however, these partitions are found by using a dynamic programming approach which is obviously inefficient. In this paper, we introduce compression speed as one criterion to evaluate compression techniques, and thoroughly analyze performances of different partitioning strategies. A linear-time optimization is also proposed, to enhance VSEncoding with faster compression speed and more flexibility to partition an index. Experiments show that our method offers a far more better compression speed, while retaining an excellent space occupancy and decompression speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference, vol. 27, pp. 61–67. Australian Computer Society, Inc. (2004)

    Google Scholar 

  2. Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1), 151–166 (2005)

    Article  Google Scholar 

  3. Anh, V.N., Moffat, A.: Index compression using 64-bit words. Softw. Pract. Exp. 40(2), 131–147 (2010)

    Google Scholar 

  4. Catena, M., Macdonald, C., Ounis, I.: On inverted index compression for search engine efficiency. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 359–371. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  6. Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical report, DERI-Digital Enterprise Research Institute, December 2010

    Google Scholar 

  7. Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61(1), 51–74 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of 14th International Conference on Data Engineering, pp. 370–379. IEEE (1998)

    Google Scholar 

  9. Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Softw. Pract. Exp. 45(1), 1–29 (2015)

    Article  Google Scholar 

  10. Manning, C.D., Raghavan, P., SchĂĽtze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)

    Book  MATH  Google Scholar 

  11. Ottaviano, G., Tonellotto, N., Venturini, R.: Optimal space-time tradeoffs for inverted indexes. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 47–56. ACM (2015)

    Google Scholar 

  12. Ottaviano, G., Venturini, R.: Partitioned elias-fano indexes. In: Proceedingsof the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 273–282. ACM (2014)

    Google Scholar 

  13. Silvestri, F., Venturini, R.: Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1219–1228. ACM (2010)

    Google Scholar 

  14. Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.:SIMD-based decoding of posting lists. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 317–326. ACM (2011)

    Google Scholar 

  15. Trotman, A.: Compression, SIMD, and postings lists. In: Proceedings of the 2014 Australasian Document Computing Symposium, p. 50. ACM (2014)

    Google Scholar 

  16. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)

    MATH  Google Scholar 

  17. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 401–410. ACM (2009)

    Google Scholar 

  18. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (CSUR) 38(2), 6 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingshen Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Song, X., Jiang, K., Jiang, Y., Yang, Y. (2016). On Optimizing Partitioning Strategies for Faster Inverted Index Compression. In: Gervasi, O., et al. Computational Science and Its Applications -- ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9789. Springer, Cham. https://doi.org/10.1007/978-3-319-42089-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42089-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42088-2

  • Online ISBN: 978-3-319-42089-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics