Tree Compression with Top Trees Revisited

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9125)

Abstract

We revisit tree compression with top trees (Bille et al. [2]), and present several improvements to the compressor and its analysis. By significantly reducing the amount of information stored and guiding the compression step using a RePair-inspired heuristic, we obtain a fast compressor achieving good compression ratios, addressing an open problem posed by [2]. We show how, with relatively small overhead, the compressed file can be converted into an in-memory representation that supports basic navigation operations in worst-case logarithmic time without decompression. We also show a much improved worst-case bound on the size of the output of top-tree compression (answering an open question posed in a talk on this algorithm by Weimann in 2012).

Keywords

Tree compression Grammar compression Top trees XML compression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alstrup, S., Holm, J., Lichtenberg, K.D., Thorup, M.: Maintaining information in fully dynamic trees with top trees. ACM TALG 1(2), 243–264 (2005)CrossRefGoogle Scholar
  2. 2.
    Bille, P., Gørtz, I.L., Landau, G.M., Weimann, O.: Tree compression with top trees. Information and Computation (2015). http://doi.org/10.1016/j.ic.2014.12.012
  3. 3.
    Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proc. SODA, pp. 373–389. SIAM (2011)Google Scholar
  4. 4.
    Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Proc. 29th VLDB, pp. 141–152. VLDB Endowment (2003)Google Scholar
  5. 5.
    Busatto, G., Lohrey, M., Maneth, S.: Grammar-based tree compression. Tech. Rep. EPFL-REPORT-52615, École Polytechnique Fédérale de Lausanne (2004)Google Scholar
  6. 6.
    Busatto, G., Lohrey, M., Maneth, S.: Efficient memory representation of XML documents. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 199–216. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  7. 7.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans Inf Theory 51(7), 2554–2576 (2005)MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Delpratt, O.D.: Space efficient in-memory representation of XML documents. Ph.D. thesis, University of Leicester, supervisor: Rajeev Raman (2009)Google Scholar
  9. 9.
    Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. Journal of the ACM (JACM) 27(4), 758–771 (1980)MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. Journal of the ACM (JACM) 57(1), 4 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014) Google Scholar
  12. 12.
    Hirakawa, M., Tanaka, T., Hashimoto, Y., Kuroda, M., Takagi, T., Nakamura, Y.: JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Research 30(1), 158–162 (2002). http://snp.ims.u-tokyo.ac.jp/XML/Mapped/old/20060612/ CrossRefGoogle Scholar
  13. 13.
    Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th FOCS, pp. 549–554. IEEE (1989)Google Scholar
  14. 14.
    Jez, A., Lohrey, M.: Approximation of smallest linear tree grammar. CoRR abs/1309.4958 (2013). http://arxiv.org/abs/1309.4958
  15. 15.
    Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  16. 16.
    Lohrey, M., Maneth, S.: The complexity of tree automata and XPath on grammar-compressed trees. Theoretical Computer Science 363(2), 196–210 (2006)MATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Lohrey, M., Maneth, S., Mennicke, R.: XML tree structure compression using RePair. Information Systems 38(8), 1150–1167 (2013)CrossRefGoogle Scholar
  18. 18.
    Maneth, S., Busatto, G.: Tree transducers and tree compressions. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 363–377. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  19. 19.
    Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-index: A compressed index based on edit-sensitive parsing. Journal of Discrete Algorithms 18, 100–112 (2013)MATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Miklau, G.: University of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets
  21. 21.
    Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM Journal on Computing 31(3), 762–776 (2001)MATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    Poyias, A.: XXML: Handling extra-large XML documents (2013). http://hdl.handle.net/2381/27744
  23. 23.
    Pătraşcu, M.: Succincter. In: Proc. 49th FOCS, pp. 305–313. IEEE (2008)Google Scholar
  24. 24.
    Wang, F., Li, J., Homayounfar, H.: A space efficient XML DOM parser. Data & Knowledge Engineering 60(1), 185–207 (2007)CrossRefGoogle Scholar
  25. 25.
    Wikimedia: enwiki dump. http://dumps.wikimedia.org/enwiki/

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute of Theoretical Informatics, Karlsruhe Institute of TechnologyKarlsruheGermany
  2. 2.Department of Computer ScienceUniversity of LeicesterLeicesterUK

Personalised recommendations