A Linear Size Index for Approximate Pattern Matching

Chan, Ho-Leung; Lam, Tak-Wah; Sung, Wing-Kin; Tam, Siu-Lung; Wong, Swee-Seong

doi:10.1007/11780441_6

Ho-Leung Chan¹⁸,
Tak-Wah Lam¹⁸,
Wing-Kin Sung¹⁹,
Siu-Lung Tam¹⁸ &
…
Swee-Seong Wong¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

942 Accesses
15 Citations

Abstract

This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m ^k) or requires Ω(n ^k) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log^k n)-space index that can support k-error matching in O(m + occ + log^k n loglogn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ + (logn)\(^{k({\it k}+1)}\) loglogn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)
Chapter Google Scholar
Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 120–131. Springer, Heidelberg (2000)
Chapter Google Scholar
Chavez, E., Navarro, G.: A metric index for approximate string matching. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 181–195. Springer, Heidelberg (2002)
Chapter Google Scholar
Cobbs, A.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)
Google Scholar
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing, pp. 91–100 (2004)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic Data Structures with Applications. In: Proceedings of Symposium on Foundations of Computer Science, pp. 390–398 (2000)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proceedings of Symposium on Theory of Computing, pp. 397–406 (2000)
Google Scholar
Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)
Chapter Google Scholar
Lam, T.W., Sung, W.K., Wong, S.S.: Improved approximate string matching using compressed suffix data structures. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, pp. 339–348. Springer, Heidelberg (2005)
Chapter Google Scholar
Maaß, M.G., Nowak, J.: Text indexing with errors.Technical Report TUM-10503, Fakultät für Informatik, TU München (March 2005)
Google Scholar
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)
Article MathSciNet MATH Google Scholar
McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)
Article MathSciNet MATH Google Scholar
Navarro, G., Baeza-Yates, R.: A Hybrid Indexing Method for Approximate String Matching. J. Discrete Algorithms 1(1), 205–209 (2000) (special issue on Matching Patterns)
MathSciNet Google Scholar
Sadakane, K.: Compressed suffix trees with full functionality. Theory of Computing Systems (accepted)
Google Scholar
Weiner, P.: Linear Pattern Matching Algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Hong Kong,
Ho-Leung Chan, Tak-Wah Lam & Siu-Lung Tam
Department of Computer Science, National University of Singapore,
Wing-Kin Sung & Swee-Seong Wong

Authors

Ho-Leung Chan
View author publications
You can also search for this author in PubMed Google Scholar
Tak-Wah Lam
View author publications
You can also search for this author in PubMed Google Scholar
Wing-Kin Sung
View author publications
You can also search for this author in PubMed Google Scholar
Siu-Lung Tam
View author publications
You can also search for this author in PubMed Google Scholar
Swee-Seong Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Moshe Lewenstein
Department of Software, Technical University of Catalonia, 08034, Barcelona, Spain
Gabriel Valiente

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chan, HL., Lam, TW., Sung, WK., Tam, SL., Wong, SS. (2006). A Linear Size Index for Approximate Pattern Matching. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_6

Download citation

DOI: https://doi.org/10.1007/11780441_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics