Advertisement

Exact Analysis of Horspool’s and Sunday’s Pattern Matching Algorithms with Probabilistic Arithmetic Automata

  • Tobias Marschall
  • Sven Rahmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6031)

Abstract

We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number \(X^p_\ell\) of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R.A., Gonnet, G.H., Régnier, M.: Analysis of Boyer-Moore-type string searching algorithms. In: SODA ’90: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pp. 328–343. SIAM, Philadelphia (1990)Google Scholar
  2. 2.
    Baeza-Yates, R.A., Régnier, M.: Average running time of the Boyer-Moore-Horspool algorithm. Theor. Comput. Sci. 92(1), 19–31 (1992)MATHCrossRefGoogle Scholar
  3. 3.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)CrossRefGoogle Scholar
  4. 4.
    Herms, I., Rahmann, S.: Computing alignment seed sensitivity with probabilistic arithmetic automata. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 318–329. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Horspool, R.N.: Practical fast searching in strings. Software-Practice and Experience 10, 501–506 (1980)CrossRefGoogle Scholar
  6. 6.
    Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. Journal of Bioinformatics and Computational Biology 4(2), 553–569 (2006)CrossRefGoogle Scholar
  8. 8.
    Mahmoud, H.M., Smythe, R.T., Régnier, M.: Analysis of Boyer-Moore-Horspool string-matching heuristic. Random Structures and Algorithms 10(1-2), 169–186 (1997)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Marschall, T., Rahmann, S.: Probabilistic arithmetic automata and their application to pattern matching statistics. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 95–106. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Marschall, T., Rahmann, S.: Efficient exact motif discovery. Bioinformatics 25(12), i356–i364 (2009)CrossRefGoogle Scholar
  11. 11.
    Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)MATHGoogle Scholar
  12. 12.
    Schulz, M., Weese, D., Rausch, T., Döring, A., Reinert, K., Vingron, M.: Fast and adaptive variable order Markov chain construction. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 306–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Smythe, R.T.: The Boyer-Moore-Horspool heuristic with Markovian input. Random Structures and Algorithms 18(2), 153–163 (2001)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Sunday, D.M.: A very fast substring search algorithm. Communications of the ACM 33(8), 132–142 (1990)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Tobias Marschall
    • 1
  • Sven Rahmann
    • 1
  1. 1.Bioinformatics for High-Throughput Technologies, Algorithm Engineering, Computer Science XITU DortmundGermany

Personalised recommendations