Skip to main content

Cache-Oblivious Index for Approximate String Matching

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

  • 699 Accesses

Abstract

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogk n)/B) disk pages and finds all k-error matches with \(O((|P|+occ)/B + \log^k n \log \log_{\scriptscriptstyle B} n)\) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require \(\Omega(|P| + occ + \mbox{poly}(\log n))\) I/Os. The second index reduces the space to O((nlogn)/B) disk pages, and the I/O complexity is O((|P| + occ)/B + logk(k + 1) n loglogn).

Research of T.W. Lam is supported by the Hong Kong RGC Grant 7140/06E. Research of R. Shah and J.S. Vitter is supported by NSF Grants IIS–0415097 and CCF–0621457, and ARO Grant DAAD 20–03–1–0321. Part of the work was done while W.K. Hon was at Purdue University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arge, L., Brodal, G.S., Fagerberg, R., Laustsen, M.: Cache-Oblivious Planar Orthogonal Range Searching and Counting. In: Proc. of Annual Symposium on Computational Geometry, pp. 160–169 (2005)

    Google Scholar 

  2. Aggarwal, A., Vitter, J.S.: The Input/Output Complexity of Sorting and Related Problems. Communications of the ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  3. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and Dictionary Matching with One Error. In: Proc. of Workshop on Algorithms and Data Structures, pp. 181–192 (1999)

    Google Scholar 

  4. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic Text and Static Pattern Matching. In: Proc. of Workshop on Algorithms and Data Structures, pp. 340–352 (2003)

    Google Scholar 

  5. Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Proc. of Latin American Symposium on Theoretical Informatics, pp. 88–94 (2000)

    Google Scholar 

  6. Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: Cache-Oblivious String B-trees. In: Proc. of Principles of Database Systems, pp. 233–242 (2006)

    Google Scholar 

  7. Bender, M.A., Demaine, E.D., Farach-Colton, M.: Cache-Oblivious B-trees. In: Proc. of Foundations of Computer Science, pp. 399–409 (2000)

    Google Scholar 

  8. Brodal, G.S., Fagerberg, R.: Funnel Heap—A Cache Oblivious Priority Queue. In: Proc. of Int. Symposium on Algorithms and Computation, pp. 219–228 (2002)

    Google Scholar 

  9. Brodal, G.S., Fagerberg, R.: Cache-Oblivious String Dictionaries. In: Proc. of Symposium on Discrete Algorithms, pp. 581–590 (2006)

    Google Scholar 

  10. Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.: Range Searching Over Tree Cross Products. In: Proc. of European Symposium on Algorithms, pp. 120–131 (2000)

    Google Scholar 

  11. Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A Linear Size Index for Approximate Pattern Matching. In: Proc. of Symposium on Combinatorial Pattern Matching, pp. 49–59 (2006)

    Google Scholar 

  12. Cobbs, A.: Fast Approximate Matching using Suffix Trees. In: Proc. of Symposium on Combinatorial Pattern Matching, pp. 41–54 (1995)

    Google Scholar 

  13. Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary Matching and Indexing with Errors and Don’t Cares. In: Proc. of Symposium on Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  14. Ferragina, P., Grossi, R.: The String B-tree: A New Data Structure for String Searching in External Memory and Its Application. JACM 46(2), 236–280 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  15. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-Oblivious Algorithms. In: Proc. of Foundations of Computer Science, pp. 285–298 (1999)

    Google Scholar 

  16. Harel, D., Tarjan, R.: Fast Algorithms for Finding Nearest Common Ancestor. SIAM Journal on Computing 13, 338–355 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  17. Lam, T.W., Sung, W.K., Wong, S.S.: Improved Approximate String Matching Using Compressed Suffix Data Structures. In: Proc. of International Symposium on Algorithms and Computation, pp. 339–348 (2005)

    Google Scholar 

  18. Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  19. McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. JACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  20. Prokop, H.: Cache-Oblivious Algorithms, Master’s thesis, MIT (1999)

    Google Scholar 

  21. Ukkonen, E.: Approximate Matching Over Suffix Trees. In: Proc. of Symposium on Combinatorial Pattern Matching, pp. 228–242 (1993)

    Google Scholar 

  22. van Emde Boas, P.: Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space. Information Processing Letters 6(3), 80–82 (1977)

    Article  MATH  Google Scholar 

  23. van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and Implementation of an Efficient Priority Queue. Mathematical Systems Theory 10, 99–127 (1977)

    Article  MATH  Google Scholar 

  24. Vitter, J.S.: External Memory Algorithms and Data Structures: Dealing with Massive Data, 2007. Revision to the article that appeared in ACM Computing Surveys 33(2), 209–271 (2001)

    Article  Google Scholar 

  25. Weiner, P.: Linear Pattern Matching Algorithms. In: Proc. of Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  26. Willard, D.E.: Log-Logarithmic Worst-Case Range Queries are Possible in SpaceΘ(N). Information Processing Letters 17(2), 81–84 (1983)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hon, WK., Lam, TW., Shah, R., Tam, SL., Vitter, J.S. (2007). Cache-Oblivious Index for Approximate String Matching. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics