Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Text Index Compression

  • Roberto KonowEmail author
  • Gonzalo Navarro
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_945-2

Abstract

Text index compression is the problem of designing a reduced-space data structure that provides fast search on a text collection, seen as a set of documents.

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Anh V, Moffat A. Simplified similarity scoring using term ranks. In: Proceedings of 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR); 2005. p. 226–33.Google Scholar
  2. 2.
    Anh V, Moffat A. Improved word-aligned binary compression for text indexing. IEEE Trans Knowl Data Eng. 2006;18(6):857–61.CrossRefGoogle Scholar
  3. 3.
    Arroyuelo D, Gil Costa V, González S, Marín M, Oyarzún M. Distributed search based on self-indexed compressed text. Inf Process Manag. 2012;48(5):819–27.CrossRefGoogle Scholar
  4. 4.
    Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. New York/Toronto: Addison-Wesley; 2011.Google Scholar
  5. 5.
    Brisaboa N, Fariña A, Ladra S, Navarro G. Implicit indexing of natural language text by reorganizing bytecodes. Inf. Retr. 2012;15(6):527–57.CrossRefGoogle Scholar
  6. 6.
    Das A, Jain A. Indexing the world wide web: the journey so far. In: Next Generation Search Engines: Advanced Models for Information Retrieval. IGI Global; 2012. p. 1–28.Google Scholar
  7. 7.
    Ding S, Suel T. Faster top-k document retrieval using block-max indexes. In: Proceedings of 34th ACM International Conference on Research and Development in Information Retrieval (SIGIR); 2011. p. 993–1002.Google Scholar
  8. 8.
    Fariña A, Brisaboa N, Navarro G, Claude F, Places A, Rodríguez E. Word-based self-indexes for natural language text. ACM TOIS. 2012;30(1):article 1.Google Scholar
  9. 9.
    Kane A, Tompa FW. Skewed partial bitvectors for list intersection. In: Proceedings of 37th ACM International Conference on Research and Development in Information Retrieval (SIGIR); 2014. p. 263–72.Google Scholar
  10. 10.
    Konow R, Navarro G, Clarke C, López-Ortíz A. Faster and smaller inverted indices with treaps. In: Proceedings of 36th ACM International Conference on Research and Development in Information Retrieval (SIGIR); 2013. p. 193–202.Google Scholar
  11. 11.
    Lemire D, Boytsov L. Decoding billions of integers per second through vectorization. Software: practice and experience; 2013, to appear. doi:10.1002/spe.2203.
  12. 12.
    Moffat A, Culpepper JS. Hybrid bitvector index compression. In: Proceedings of 12th Australasian Document Computing Symposium (ADCS); 2007. p. 25–31.Google Scholar
  13. 13.
    Navarro G. Spaces, trees and colors: the algorithmic landscape of document retrieval on sequences. ACM Comput Surv. 2014;46(4):article 52.Google Scholar
  14. 14.
    Navarro G, Mäkinen V. Compressed full-text indexes. ACM Comput Surv. 2007;39(1):article 2.Google Scholar
  15. 15.
    Persin M, Zobel J, Sacks-Davis R. Filtered document retrieval with frequency-sorted indexes. J Am Soc Inf Sci. 1996;47(10):749–64.CrossRefGoogle Scholar
  16. 16.
    Salton G. Automatic information organization and retrieval. New York: McGraw-Hill; 1968.Google Scholar
  17. 17.
    Solomon D. Variable-length codes for data compression. London: Springer; 2007.CrossRefGoogle Scholar
  18. 18.
    Witten I, Moffat A, Bell T. Managing gigabytes. 2nd ed. New York: Van Nostrand Reinhold; 1999.zbMATHGoogle Scholar
  19. 19.
    Zobel J, Moffat A. Inverted files for text search engines. ACM Comput Surv. 2006;38(2):6–6.CrossRefGoogle Scholar
  20. 20.
    Zukowski M, Héman S, Nes N, Boncz PA. Super-scalar RAM-CPU cache compression. In: Proceedings of 22nd IEEE International Conference on Data Engineering (ICDE); 2006. p. 59–71.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of ChileSantiagoChile

Section editors and affiliations

  • Edie Rasmussen
    • 1
  1. 1.Library, Archival & Information StudiesThe University of British ColumbiaVancouverCanada