Skip to main content

Perfect hashing for strings: Formalization and algorithms

  • Conference paper
  • First Online:
Book cover Combinatorial Pattern Matching (CPM 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Abstract

Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly efficient sequential/parallel (CRCW PRAM) Las Vegas type algorithms that determine a perfect hash function for substring hashing. For example, given a binary string of length n, one of our algorithms finds a perfect hash function in O(log n) time, O(n) work, and O(n) space; the hash value for any substring can then be computed in O(log log n) time using a single processor. Our approach relies on a novel use of the suffix tree of a string. In implementing our approach, we design optimal parallel algorithms for the problem of determining weighted ancestors on a edge-weighted tree that may be of independent interest.

Supported by NSF Career Development Award CCR-9501942 and an Alfred P. Sloan Research Fellowship.

Partly supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), and partly supported by ALCOM IT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms. Proc. of 3rd Combinatorial Pattern Matching Conference, pages 259–272, 1992. Tucson, Arizona.

    Google Scholar 

  2. S.F. Altschul, W. Gish, W. Miller, E.W Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.

    PubMed  Google Scholar 

  3. A. Apostolico, C. Iliopoulos, G.M. Landau, B. Scieber, and U. Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3:347–365, 1988.

    Article  Google Scholar 

  4. A. Broder. Applications of Karp-Rabin fingerprints. Manuscript, 1993.

    Google Scholar 

  5. Omer Berkman and Uzi Vishkin. Recursive *-tree parallel data-structure. In Proc. of the 30th IEEE Annual Symp. on Foundation of Computer Science, pages 196–202, 1989.

    Google Scholar 

  6. B. Chazelle. Computing on a free tree via complexity-preserving mappings. Algorithmica, 2:337–361, 1987.

    Article  Google Scholar 

  7. Z J. Czech, G. Havas, and B S. Majewski. An optimal algorithm for generating minimal perfect hash functions. Technical Report 24, DIMACS, 1992.

    Google Scholar 

  8. P. Dietz. Finding level-ancestors in dynamic trees. Manuscript, 1992.

    Google Scholar 

  9. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538–544, July 1984.

    Google Scholar 

  10. M. Farach and S. Muthukrishnan. Optimal parallel dictionary matching and compression. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995.

    Google Scholar 

  11. M. Farach and S. Muthukrishnan. Optimal Logarithmic Time Randomized Suffix Tree Construction. To be presented at the 23rd Intl. Colloq. on Automata, Languages and Programming, 1996.

    Google Scholar 

  12. L. Gasieniec and K. Park. Optimal parallel prefix matching. Proceedings of E.S.A., 1994.

    Google Scholar 

  13. H. Hampapuram and M. Fredman. Optimal bi-weighted binary trees and the complexity of mainitaining partial sums. Proc. IEEE Symp. on Foundations on Computer Sc, 1993, 480–485.

    Google Scholar 

  14. R. Hariharan and S. Muthukrishnan. Optimal parallel prefix matching. Proc. of 21st International Colloquium on Automata Languages and Programming, 1994.

    Google Scholar 

  15. D. E. Knuth. The Art of Computer Programming, V. 3: Sorting and Searching. Addison-Wesley, Reading, 1973.

    Google Scholar 

  16. D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6:323–350, 1977.

    Article  Google Scholar 

  17. R.M. Karp and M.O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31:249–260, 1987.

    Google Scholar 

  18. S. Muthukrishnan. Detecting false matches in string matching algorithms. In Proc. of 4th Combinatorial Pattern Matching Conference, 1993.

    Google Scholar 

  19. M. Naor. String matching with preprocessing of text and pattern. Proc. of 18th International Colloquium on Automata Languages and Programming, pages 739–750, 1991.

    Google Scholar 

  20. M. Rabin. An algorithm for finding all repetitions. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, pages 85–96. Springer-Verlag, Berlin, 1984.

    Google Scholar 

  21. D. Sleator and R. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 24, 1983.

    Google Scholar 

  22. S. C. Sahinalp and U. Vishkin. Symmetry breaking for suffix tree construction. Proc. of the 26th Ann. ACM Symp. on Theory of Computing, 1994.

    Google Scholar 

  23. P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Math. Systems Theory, 10:99–127, 1977.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Farach, M., Muthukrishnan, S. (1996). Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-61258-0_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61258-2

  • Online ISBN: 978-3-540-68390-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics