Strategic Pattern Search in Factor-Compressed Text

  • Simon Gog
  • Alistair Moffat
  • Matthias Petri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8799)


We consider the problem of pattern-search in compressed text in a context in which: (a) the text is stored as a sequence of factors against a static phrase-book; (b) decoding of factors is from right-to-left; and (c) extraction of each symbol in each factor requires Θ(logσ) time, where σ is the size of the original alphabet. To determine possible alignments given information about decoded characters we introduce two Boyer-Moore-like searching mechanisms, including one that makes use of a suffix array constructed over the pattern. The new mechanisms decode fewer than half the symbols that are required by a sequential left-to-right search such as the Knuth-Morris-Pratt approach, a saving that translates directly into improved execution time. Experiments with a two-level suffix array index structure for 4 GB of English text demonstrate the usefulness of the new techniques.


string search pattern matching suffix array Burrows-Wheeler transform succinct data structure disk-based algorithm experimental evaluation 


Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Simon Gog
    • 1
    • 2
  • Alistair Moffat
    • 1
  • Matthias Petri
    • 1
  1. 1.Department of Computing and Information SystemsThe University of MelbourneAustralia
  2. 2.Institute of Theoretical InformaticsKarlsruhe Institute of TechnologyGermany

