Chapter

Advances in Information Retrieval

Volume 8416 of the series Lecture Notes in Computer Science pp 359-371

On Inverted Index Compression for Search Engine Efficiency

  • Matteo CatenaAffiliated withGSSI - Gran Sasso Science Institute, INFN
  • , Craig MacdonaldAffiliated withSchool of Computing Science, University of Glasgow
  • , Iadh OunisAffiliated withSchool of Computing Science, University of Glasgow

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we experiment with different modern integer compression algorithms, integrating these into a modern IR system. Through comprehensive experiments conducted on two large, widely used document corpora and large query sets, our results show the benefit of compression for different types of posting information to the space- and time-efficiency of the search engine. Overall, we find that the simple Frame of Reference compression scheme results in the best query response times for all types of posting information. Moreover, we observe that the frequency and position posting information in Web corpora that have large volumes of anchor text are more challenging to compress, yet compression is beneficial in reducing average query response times.