From Theory to Practice: Plug and Play with Succinct Data Structures

  • Simon Gog
  • Timo Beller
  • Alistair Moffat
  • Matthias Petri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8504)

Abstract

Engineering efficient implementations of compact and succinct structures is time-consuming and challenging, since there is no standard library of easy-to-use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is difficult, since older baseline implementations may not rely on the same basic components, and reimplementing from scratch can be time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing two succinct solutions for top-k document retrieval which can operate on both character and integer alphabets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Claude, F., Navarro, G.: Practical rank/select queries over arbitrary sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Culpepper, J.S., Petri, M., Scholer, F.: Efficient in-memory top-k document retrieval. In: Proc. SIGIR, pp. 225–234 (2012)Google Scholar
  4. 4.
    Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice. J. Experimental Alg. 13 (2008)Google Scholar
  5. 5.
    Gog, S., Petri, M.: Optimized succinct data structures for massive data. In: Soft. Prac. & Exp. (2013) (to appear) , http://dx.doi.org/10.1002/spe.2198
  6. 6.
    Grossi, R., Ottaviano, G.: Design of practical succinct data structures for large data collections. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 5–17. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Hon, W.-K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: Proc. FOCS, pp. 713–722 (2009)Google Scholar
  8. 8.
    Konow, R., Navarro, G.: Faster compact top-k document retrieval. In: Proc. DCC, pp. 5–17 (2013)Google Scholar
  9. 9.
    Jesper Larsson, N., Sadakane, K.: Faster suffix sorting. Theor. Comp. Sc. 387(3), 258–272 (2007)CrossRefMATHGoogle Scholar
  10. 10.
    Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Moffat, A., Gog, S.: String search experimentation using massive data. Phil. Trans. Royal Soc. A (to appear, 2014)Google Scholar
  12. 12.
    Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proc. SODA, pp. 1066–1078 (2012)Google Scholar
  13. 13.
    Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. ACM Comp. Surv. (to appear, 2014)Google Scholar
  14. 14.
    Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Navarro, G., Puglisi, S.J., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proc. ALENEX (2007)Google Scholar
  17. 17.
    Patil, M., Thankachan, S.V., Shah, R., Hon, W.-K., Vitter, J.S., Chandrasekaran, S.: Inverted indexes for phrases and strings. In: Proc. SIGIR, pp. 555–564 (2011)Google Scholar
  18. 18.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA, pp. 233–242 (2002)Google Scholar
  19. 19.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2), 294–313 (2003)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Sadakane, K.: Compressed suffix trees with full functionality. Theory Comp. Sys. 41(4), 589–607 (2007)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Simon Gog
    • 1
  • Timo Beller
    • 2
  • Alistair Moffat
    • 1
  • Matthias Petri
    • 1
  1. 1.Dept. Computing and Information SystemsThe University of MelbourneVictoriaAustralia
  2. 2.Inst. Theoretical Computer ScienceUlm UniversityGermany

Personalised recommendations