Reference Work Entry

Encyclopedia of Algorithms

pp 1-99

Compressed Text Indexing

2005; Ferragina, Manzini
  • Veli MäkinenAffiliated withDepartment of Computer Science, University of Helsinki
  • , Gonzalo NavarroAffiliated withDepartment of Computer Science, University of Chile

Keywords and Synonyms

Space-efficient text indexing; Compressed full-text indexing; Self-indexing

Problem Definition

Given a text string\( { T = t_1 t_2 \dots t_n } \) over an alphabet Σ of size σ, the compressed text indexing (CTI) problem asks to replaceT with a space-efficient data structure capable of efficiently answering basic string matching and substring queries on T. Typical queries required from such an index are the following:

  • \( { count(P) } \): count how many times a given pattern string\( { P = p_1 p_2 \dots p_m } \) occurs in T.

  • \( { locate(P) } \): return the locations where P occurs in T.

  • display(i, j): return \( { T[i,j] } \).

Key Results

An elegant solution to the problem is obtained by exploiting the connection of Burrows-Wheeler Transform (BWT) [1] and Suffix Array data structure [9]. The suffix array \( { SA[1,n] } \) of T is the permutation of text positions \( { (1 \dots n) } \) listing the suffixes ...

This is an excerpt from the content