, Volume 11, Issue 3, pp 175-207
Date: 08 Jan 2008

Hybrid index maintenance for contiguous inverted lists

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories: merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query performance, but substantially better update performance. The actual time savings achievable depend on the size of the text collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.

This work extends upon the previous publications “A hybrid approach to index maintenance in dynamic text retrieval systems” (S. Büttcher and C.L.A. Clarke, Proceedings of the 28th European Conference on Information Retrieval, London, UK, 2006) and “Hybrid index maintenance for growing text collections” (S. Büttcher, C.L.A. Clarke, and B. Lushman, Proceedings of the 29th ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006).