Advertisement

On-Line Approximate String Matching with Bounded Errors

  • Marcos Kiwi
  • Gonzalo Navarro
  • Claudio Telha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5029)

Abstract

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ε so that the algorithm is allowed to miss occurrences with probability ε. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(nlog σ m/m) time with any \(\epsilon = \textrm{poly}(k/m)\), where n is the text size, m the pattern length, k the number of errors for edit distance, and σ the alphabet size. Our experimental results show the practicality of this novel and promising research direction.

Keywords

Bound Error String Match String Match Algorithm Promising Research Direction Pattern Substring 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, W., Lawler, E.: Sublinear approximate string matching and biological applications. Algorithmica 12(4-5), 327–344 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Chang, W., Marr, T.: Approximate string matching and local similarity. In: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pp. 259–273. Springer, Heidelberg (1994)Google Scholar
  3. 3.
    Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM Journal of Experimental Algorithmics (article 1.4) 9 (2004)Google Scholar
  4. 4.
    Janson, S.: Large deviations for sums of partly dependent random variables. Random Structure & Algorithms 24(3), 234–248 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  6. 6.
    Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001)Google Scholar
  7. 7.
    Ron, D.: Property Testing. In: Handbook of Randomized Computing, volume II of Combinatorial Optimization, vol. 9. Springer, Heidelberg (2001)Google Scholar
  8. 8.
    Rubinfeld, R., Kumar, R.: Algorithms column: Sublinear time algorithms. SIGACT News 34(4), 57–67 (2003)CrossRefGoogle Scholar
  9. 9.
    Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Marcos Kiwi
    • 1
  • Gonzalo Navarro
    • 2
  • Claudio Telha
    • 3
  1. 1.Departamento de Ingeniería MatemáticaCentro de Modelamiento Matemático UMI 2807 CNRS-UChile 
  2. 2.Department of Computer ScienceUniversity of Chile 
  3. 3.Operations Research CenterMIT 

Personalised recommendations