Perfect hashing for strings: Formalization and algorithms

  • Martin Farach
  • S. Muthukrishnan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1075)

Abstract

Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly efficient sequential/parallel (CRCW PRAM) Las Vegas type algorithms that determine a perfect hash function for substring hashing. For example, given a binary string of length n, one of our algorithms finds a perfect hash function in O(log n) time, O(n) work, and O(n) space; the hash value for any substring can then be computed in O(log log n) time using a single processor. Our approach relies on a novel use of the suffix tree of a string. In implementing our approach, we design optimal parallel algorithms for the problem of determining weighted ancestors on a edge-weighted tree that may be of independent interest.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AFM92]
    A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms. Proc. of 3rd Combinatorial Pattern Matching Conference, pages 259–272, 1992. Tucson, Arizona.Google Scholar
  2. [AGM+90]
    S.F. Altschul, W. Gish, W. Miller, E.W Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.PubMedGoogle Scholar
  3. [AI+88]
    A. Apostolico, C. Iliopoulos, G.M. Landau, B. Scieber, and U. Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3:347–365, 1988.CrossRefGoogle Scholar
  4. [B93]
    A. Broder. Applications of Karp-Rabin fingerprints. Manuscript, 1993.Google Scholar
  5. [BV89]
    Omer Berkman and Uzi Vishkin. Recursive *-tree parallel data-structure. In Proc. of the 30th IEEE Annual Symp. on Foundation of Computer Science, pages 196–202, 1989.Google Scholar
  6. [C87]
    B. Chazelle. Computing on a free tree via complexity-preserving mappings. Algorithmica, 2:337–361, 1987.CrossRefGoogle Scholar
  7. [CHM92]
    Z J. Czech, G. Havas, and B S. Majewski. An optimal algorithm for generating minimal perfect hash functions. Technical Report 24, DIMACS, 1992.Google Scholar
  8. [D92]
    P. Dietz. Finding level-ancestors in dynamic trees. Manuscript, 1992.Google Scholar
  9. [FKS84]
    Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538–544, July 1984.Google Scholar
  10. [FM95]
    M. Farach and S. Muthukrishnan. Optimal parallel dictionary matching and compression. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995.Google Scholar
  11. [FM96]
    M. Farach and S. Muthukrishnan. Optimal Logarithmic Time Randomized Suffix Tree Construction. To be presented at the 23rd Intl. Colloq. on Automata, Languages and Programming, 1996.Google Scholar
  12. [GP94]
    L. Gasieniec and K. Park. Optimal parallel prefix matching. Proceedings of E.S.A., 1994.Google Scholar
  13. [HF93]
    H. Hampapuram and M. Fredman. Optimal bi-weighted binary trees and the complexity of mainitaining partial sums. Proc. IEEE Symp. on Foundations on Computer Sc, 1993, 480–485.Google Scholar
  14. [HM94]
    R. Hariharan and S. Muthukrishnan. Optimal parallel prefix matching. Proc. of 21st International Colloquium on Automata Languages and Programming, 1994.Google Scholar
  15. [K73]
    D. E. Knuth. The Art of Computer Programming, V. 3: Sorting and Searching. Addison-Wesley, Reading, 1973.Google Scholar
  16. [KMP77]
    D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6:323–350, 1977.CrossRefGoogle Scholar
  17. [KR87]
    R.M. Karp and M.O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31:249–260, 1987.Google Scholar
  18. [M93]
    S. Muthukrishnan. Detecting false matches in string matching algorithms. In Proc. of 4th Combinatorial Pattern Matching Conference, 1993.Google Scholar
  19. [N91]
    M. Naor. String matching with preprocessing of text and pattern. Proc. of 18th International Colloquium on Automata Languages and Programming, pages 739–750, 1991.Google Scholar
  20. [R84]
    M. Rabin. An algorithm for finding all repetitions. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, pages 85–96. Springer-Verlag, Berlin, 1984.Google Scholar
  21. [ST83]
    D. Sleator and R. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 24, 1983.Google Scholar
  22. [SV94]
    S. C. Sahinalp and U. Vishkin. Symmetry breaking for suffix tree construction. Proc. of the 26th Ann. ACM Symp. on Theory of Computing, 1994.Google Scholar
  23. [vKZ77]
    P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Math. Systems Theory, 10:99–127, 1977.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 1996

Authors and Affiliations

  • Martin Farach
    • 1
  • S. Muthukrishnan
    • 2
  1. 1.Dept. of Computer Sc.Rutgers Univ.PiscatawayUSA
  2. 2.Univ. of WarwickUSA

Personalised recommendations