A New Filtration Method and a Hybrid Strategy for Approximate String Matching

Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 20)

Abstract

In this paper, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called k differences problem), which aims to find all the positions i’s in a given text such that there exists a substring of the text ending at position i whose edit distance from a given pattern is less than or equal to a given error bound k. Our experimental results on simulated datasets of DNA sequences show that our filtration algorithm has better performance on the efficiency to filter out those positions of the text at which the pattern does not occur approximately. Moreover, our hybrid filtration strategy further improves the performance efficiency of our filtration algorithm greatly when the ratio of the error bound and the pattern size is about 0.2.

Keywords

approximate string matching filtration q-gram hybrid 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20, 762–772 (1977)MATHCrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Navarro, G.: Faster approximate string matching. Algorithmica 23, 127–158 (1999)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R., Perleberg, C.: Fast and practical approximate pattern matching. Information Processing Letters 59, 21–27 (1996)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Chang, W.I., Lawler, E.L.: Approximate string matching in sublinear expected time. In: Proceedings of the ACM-SIAM 31st Annual Symposium on Foundations of Computer Science, pp. 116–124 (1990)Google Scholar
  5. 5.
    Chang, W., Lawler, E.: Sublinear approximate string matching and biological applications. Algorithmica 12, 327–344 (1994)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Chang, W., Marr, T.: Approximate String Matching and Local Similarity. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 259–273. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  7. 7.
    Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM Journal of Experimental Algorithmics 9, 1–47 (2004)MathSciNetGoogle Scholar
  8. 8.
    Giegerich, R., Kurtz, S., Hischke, F., Ohlebusch, E.: A general technique to improve filter algorithms for approximate string matching. In: Proceedings of the 4th South American Workshop on String Processing (WSP 1997), pp. 38–52 (1997)Google Scholar
  9. 9.
    Horspool, R.N.: Practical fast searching in strings. Software - Practice & Experience 10, 501–506 (1980)CrossRefGoogle Scholar
  10. 10.
    Hyyrö, H., Navarro, G.: Bit-parallel witnesses and their applications to approximate string matching. Algorithmica 41, 203–231 (2005)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. Journal of Algorithms 10, 157–169 (1989)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Myers, G.: A Fast Bit-Vector Algorithm for Approximate Pattern Matching Based on Dynamic Programming. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 1–13. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  14. 14.
    Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 46, 395–415 (1999)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Navarro, G.: Multiple approximate string matching by counting. In: Proceedings of the 4th South American Workshop on String Processing (WSP 1997), pp. 125–139. Carleton University Press (1997)Google Scholar
  16. 16.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33, 31–88 (2001)CrossRefGoogle Scholar
  17. 17.
    Navarro, G., Baeza-Yates, R.: Improving an algorithm for approximate pattern matching. Technical Report TR/DCC-98-5, Department of Computer Science, University of Chile (1998)Google Scholar
  18. 18.
    Navarro, G., Baeza-Yates, R.: Very fast and simple approximate string matching. Information Processing Letters 72, 65–70 (1999)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms 1, 205–239 (2000)MathSciNetGoogle Scholar
  20. 20.
    Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics 5, 1–36 (2000)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)CrossRefGoogle Scholar
  22. 22.
    Sellers, P.H.: String matching with errors. Journal of Algorithms 20, 359–373 (1980)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Tarhio, J., Ukkonen, E.: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 243–260 (1993)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Ukkonen, E.: Finding approximate patterns in strings. Journal of Algorithms 6, 132–137 (1985)MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Ukkonen, E.: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)MathSciNetMATHCrossRefGoogle Scholar
  27. 27.
    Weiner, P.: Linear pattern matching algorithm. In: 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar
  28. 28.
    Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal ACM 21, 168–173 (1974)MATHCrossRefGoogle Scholar
  29. 29.
    Wu, S., Manber, U.: Fast text searching: allowing errors. Communications of the ACM 35, 83–91 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceNational Tsing Hua UniversityHsinchu CityTaiwan

Personalised recommendations