A Compact In-Memory Dictionary for RDF Data

  • Hamid R. Bazoobandi
  • Steven de Rooij
  • Jacopo Urbani
  • Annette ten Teije
  • Frank van Harmelen
  • Henri Bal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9088)

Abstract

While almost all dictionary compression techniques focus on static RDF data, we present a compact in-memory RDF dictionary for dynamic and streaming data. To do so, we analysed the structure of terms in real-world datasets and observed a high degree of common prefixes. We studied the applicability of Trie data structures on RDF data to reduce the memory occupied by common prefixes and discovered that all existing Trie implementations lead to either poor performance, or an excessive memory wastage.

In our approach, we address the existing limitations of Tries for RDF data, and propose a new variant of Trie which contains some optimizations explicitly designed to improve the performance on RDF data. Furthermore, we show how we use this Trie as an in-memory dictionary by using as numerical ID a memory address instead of an integer counter. This design removes the need for an additional decoding data structure, and further reduces the occupied memory. An empirical analysis on real-world datasets shows that with a reasonable overhead our technique uses 50–59% less memory than a conventional uncompressed dictionary.

Notes

Acknowledgment

This project was partially funded by the COMMIT project, and by the NWO VENI project 639.021.335.

References

  1. 1.
  2. 2.
    Askitis, N., Sinha, R.: Hat-trie: a cache-conscious trie-based data structure for strings. In: Proceedings of the Thirtieth Australasian Conference on Computer Science, vol. 62, pp. 97–105. Australian Computer Society Inc (2007)Google Scholar
  3. 3.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  4. 4.
    Binnig, C., Hildenbrand, S., Färber, F.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD. ACM (2009)Google Scholar
  5. 5.
    Cheng, L., Malik, A., Kotoulas, S., Ward, T.E., Theodoropoulos, G.: Efficient parallel dictionary encoding for RDF dataGoogle Scholar
  6. 6.
    Cleary, J.G., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32(4), 396–402 (1984)CrossRefGoogle Scholar
  7. 7.
    De La Briandais, R.: File searching using variable length keys. In: Papers Presented at the 3–5 March 1959, Western Joint Computer Conference. ACM (1959)Google Scholar
  8. 8.
    Erling, O., Mikhailov, I.: RDF support in the Virtuoso DBMS. In: Networked Knowledge-Networked Media, pp. 7–24. Springer (2009)Google Scholar
  9. 9.
    Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: WWW, pp. 1091–1092. ACM (2010)Google Scholar
  10. 10.
    Fernández, J.D., Martínez-Prieto, M.A., Gutierrez, C.: Compact representation of large RDF data sets for publishing and exchange. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 193–208. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  11. 11.
    Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)CrossRefGoogle Scholar
  12. 12.
    Google. Freebase data dumps. http://download.freebase.com/datadumps
  13. 13.
    Heinz, S., Zobel, J., Williams, H.E.: Burst tries: a fast, efficient data structure for string keys. ACM TOIS 20(2), 192–223 (2002)CrossRefGoogle Scholar
  14. 14.
    Käfer, T., Harth, A.: Billion triples challenge data set (2014). Downloaded from http://km.aifb.kit.edu/projects/btc-2014/
  15. 15.
    Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching. International Monetary Fund (1998)Google Scholar
  16. 16.
    Kotoulas, S., Oren, E., Van Harmelen, F.: Mind the data skew: distributed inferencing by speeddating in elastic regions. In: WWW. ACM (2010)Google Scholar
  17. 17.
    Lassila, O., Swick, R.R.: Resource description framework (RDF) model and syntax specification (1999)Google Scholar
  18. 18.
    Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: ARTful indexing for main-memory databases. In: 2013 IEEE 29th International Conference on ICDE, pp. 38–49. IEEE (2013)Google Scholar
  19. 19.
    Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Querying RDF dictionaries in compressed space. ACM SIGAPP 12(2), 64–77 (2012)CrossRefGoogle Scholar
  20. 20.
    Morrison, D.R.: PATRICIA-practical algorithm to retrieve information coded in alphanumeric. JACM 15(4), 514–534 (1968)CrossRefGoogle Scholar
  21. 21.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. VLDB 1(1), 647–659 (2008)Google Scholar
  22. 22.
    Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L., Storey, M.-A., Chute, C.G., et al.: Bioportal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 37, W170–W173 (2009)CrossRefGoogle Scholar
  23. 23.
    Sussenguth Jr., E.H.: Use of tree structures for processing files. Commun. ACM 6(5), 272–279 (1963)CrossRefGoogle Scholar
  24. 24.
    Urbani, J., Maassen, J., Bal, H.: Massive semantic web data compression with MapReduce. In: HPDC, pp. 795–802. ACM (2010)Google Scholar
  25. 25.
    Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. VLDB 6(7), 517–528 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hamid R. Bazoobandi
    • 1
  • Steven de Rooij
    • 1
    • 2
  • Jacopo Urbani
    • 1
    • 3
  • Annette ten Teije
    • 1
  • Frank van Harmelen
    • 1
  • Henri Bal
    • 1
  1. 1.Department of Computer ScienceVU University AmsterdamAmsterdamThe Netherlands
  2. 2.Department of Computer ScienceUniversity of AmsterdamAmsterdamThe Netherlands
  3. 3.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations