Fast Prefix Search in Little Space, with Applications

  • Djamal Belazzougui
  • Paolo Boldi
  • Rasmus Pagh
  • Sebastiano Vigna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6346)

Abstract

A prefix search returns the strings out of a given collection S that start with a given prefix. Traditionally, prefix search is solved by data structures that are also dictionaries, that is, they actually contain the strings in S. For very large collections stored in slow-access memory, we propose extremely compact data structures that solve weak prefix searches—they return the correct result only if some string in S starts with the given prefix. Our data structures for weak prefix search use O(|S|logℓ) bits in the worst case, where ℓ is the average string length, as opposed to O(|S| ℓ) bits for a dictionary. We show a lower bound implying that this space usage is optimal.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alstrup, S., Brodal, G.S., Rauhe, T.: Optimal static range reporting in one dimension. In: STOC 2001, pp. 476–482 (2001)Google Scholar
  2. 2.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses. In: SODA 2009, pp. 785–794. ACM Press, New York (2009)Google Scholar
  3. 3.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: ALENEX 2009. SIAM, Philadelphia (2009)Google Scholar
  4. 4.
    Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: Cache-oblivious string B-trees. In: PODS 2006, pp. 233–242. ACM, New York (2006)CrossRefGoogle Scholar
  5. 5.
    Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionaries. In: SODA 2006, pp. 581–590 (2006)Google Scholar
  6. 6.
    Dietzfelbinger, M., Gil, J., Matias, Y., Pippenger, N.: Polynomial hash functions are reliable (extended abstract). In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 235–246. Springer, Heidelberg (1992)Google Scholar
  7. 7.
    Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approximate membership (extended abstract). In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 385–396. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Elias, P.: Efficient storage and retrieval by content and address of static files. J. Assoc. Comput. Mach. 21(2), 246–260 (1974)MATHMathSciNetGoogle Scholar
  9. 9.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. on Info. Theory 21, 194–203 (1975)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. Journal of the ACM 46(2), 236–280 (1999)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Ferragina, P., Grossi, R., Gupta, A., Shah, R., Vitter, J.S.: On searching compressed string collections cache-obliviously. In: PODS 2008, pp. 181–190 (2008)Google Scholar
  12. 12.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: FOCS 1999, pp. 285–297. IEEE Comput. Soc. Press, Los Alamitos (1999)Google Scholar
  13. 13.
    Gupta, A., Hon, W.-K., Shah, R., Vitter, J.S.: Compressed data structures: Dictionaries and data-aware measures. Theor. Comput. Sci. 387(3), 313–331 (2007)MATHMathSciNetGoogle Scholar
  14. 14.
    Jacobson, G.: Space-efficient static trees and graphs. In: FOCS 1989, pp. 549–554 (1989)Google Scholar
  15. 15.
    Pǎtraşcu, M., Thorup, M.: Randomization does not help searching predecessors. In: SODA 2007, pp. 555–564 (2007)Google Scholar
  16. 16.
    Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA 2002, pp. 233–242. ACM Press, New York (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Djamal Belazzougui
    • 1
  • Paolo Boldi
    • 2
  • Rasmus Pagh
    • 3
  • Sebastiano Vigna
    • 2
  1. 1.Université Paris DiderotParis 7France
  2. 2.Dipartimento di Scienze dell’InformazioneUniversità degli Studi di MilanoItaly
  3. 3.IT University of CopenhagenDenmark

Personalised recommendations