Advertisement

A Linear Size Index for Approximate Pattern Matching

  • Ho-Leung Chan
  • Tak-Wah Lam
  • Wing-Kin Sung
  • Siu-Lung Tam
  • Swee-Seong Wong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

Abstract

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m k ) or requires Ω(n k ) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log k n)-space index that can support k-error matching in O(m + occ + log k n loglogn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ + (logn)\(^{k({\it k}+1)}\) loglogn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  2. 2.
    Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 120–131. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Chavez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Cobbs, A.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)Google Scholar
  5. 5.
    Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing, pp. 91–100 (2004)Google Scholar
  6. 6.
    Ferragina, P., Manzini, G.: Opportunistic Data Structures with Applications. In: Proceedings of Symposium on Foundations of Computer Science, pp. 390–398 (2000)Google Scholar
  7. 7.
    Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proceedings of Symposium on Theory of Computing, pp. 397–406 (2000)Google Scholar
  8. 8.
    Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Lam, T.W., Sung, W.K., Wong, S.S.: Improved approximate string matching using compressed suffix data structures. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, pp. 339–348. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Maaß, M.G., Nowak, J.: Text indexing with errors.Technical Report TUM-10503, Fakultät für Informatik, TU München (March 2005)Google Scholar
  11. 11.
    Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)CrossRefMathSciNetzbMATHGoogle Scholar
  12. 12.
    McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)CrossRefMathSciNetzbMATHGoogle Scholar
  13. 13.
    Navarro, G., Baeza-Yates, R.: A Hybrid Indexing Method for Approximate String Matching. J. Discrete Algorithms 1(1), 205–209 (2000) (special issue on Matching Patterns)MathSciNetGoogle Scholar
  14. 14.
    Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems (accepted)Google Scholar
  15. 15.
    Weiner, P.: Linear Pattern Matching Algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ho-Leung Chan
    • 1
  • Tak-Wah Lam
    • 1
  • Wing-Kin Sung
    • 2
  • Siu-Lung Tam
    • 1
  • Swee-Seong Wong
    • 2
  1. 1.Department of Computer ScienceUniversity of Hong Kong 
  2. 2.Department of Computer ScienceNational University of Singapore 

Personalised recommendations