Advertisement

HDTQ: Managing RDF Datasets in Compressed Space

  • Javier D. Fernández
  • Miguel A. Martínez-Prieto
  • Axel Polleres
  • Julian Reindorf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)

Abstract

HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems.

Notes

Acknowledgements

Supported by the EU’s Horizon 2020 research and innovation programme: grant 731601 (SPECIAL), the Austrian Research Promotion Agency’s (FFG) program “ICT of the Future”: grant 861213 (CitySpin), and MINECO-AEI/FEDER-UE ETOME-RDFD3: TIN2015-69951-R and TIN2016-78011-C4-1-R; Axel Polleres is supported under the Distinguished Visiting Austrian Chair Professors program hosted by The Europe Center of Stanford University. Thanks to Tobias Kuhn for the pointer to the LIDDI dataset.

References

  1. 1.
    Abbassi, S., Faiz, R.: RDF-4X: a scalable solution for RDF quads store in the cloud. In: Proceedings of MEDES, pp. 231–236 (2016)Google Scholar
  2. 2.
    Banda, J.M., Kuhn, T., Shah, N.H., Dumontier, M.: Provenance-centered dataset of drug-drug interactions. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 293–300. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25010-6_18CrossRefGoogle Scholar
  3. 3.
    Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 213–228. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_14CrossRefGoogle Scholar
  4. 4.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227 (2009)Google Scholar
  5. 5.
    Boncz, P., Erling, O., Pham, M.-D.: Advances in large-scale RDF data management. In: Auer, S., Bryl, V., Tramp, S. (eds.) Linked Open Data – Creating Knowledge Out of Interlinked Data. LNCS, vol. 8661, pp. 21–44. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09846-3_2CrossRefGoogle Scholar
  6. 6.
    Cerdeira-Pena, A., Farina, A., Fernández, J.D., Martínez-Prieto, M.A.: Self-indexing RDF archives. In: Proceedings of DCC, pp. 526–535 (2016)Google Scholar
  7. 7.
    Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). JWS 19, 22–41 (2013)CrossRefGoogle Scholar
  8. 8.
    Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies for RDF archives. In: Proceedings of SEMANTiCS, pp. 41–48 (2016)Google Scholar
  9. 9.
    Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies for RDF archives. Semant. Web J. SWJ (2017, under review). http://www.semantic-web-journal.net/content/evaluating-query-and-storage-strategies-rdf-archives
  10. 10.
    Garlik, S.H., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language, W3C Recommendation (2013)Google Scholar
  11. 11.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proceedings of WEA, pp. 27–38 (2005)Google Scholar
  12. 12.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for owl knowledge base systems. JWS 3(2), 158–182 (2005)CrossRefGoogle Scholar
  13. 13.
    Harth, A., Decker, S.: Optimized index structures for querying RDF from the web. In: Proceeding of LA-WEB, p. 10 (2005)Google Scholar
  14. 14.
    Leeka, J., Bedathur, S.: RQ-RDF-3X: going beyond triplestores. In: Proceedings of ICDEW, pp. 263–268 (2014)Google Scholar
  15. 15.
    Lemire, D., Kaser, O., Kurz, N., Deri, L., O’Hara, C., Saint-Jacques, F., Ssi-Yan-Kai, G.: Roaring bitmaps: implementation of an optimized software library. arXiv preprint arXiv:1709.07821 (2017)
  16. 16.
    Martínez-Prieto, M.A., Arias Gallego, M., Fernández, J.D.: Exchange and consumption of huge RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 437–452. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-30284-8_36CrossRefGoogle Scholar
  17. 17.
    Schreiber, G., Raimond, Y.: RDF 1.1 Primer. W3C Working Group Note (2014)Google Scholar
  18. 18.
    Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester, B., Haesendonck, G., Colpaert, P.: Triple pattern fragments: a low-cost knowledge graph interface for the web. JWS 37–38, 184–206 (2016)CrossRefGoogle Scholar
  19. 19.
    Zimmermann, A., Lopes, N., Polleres, A., Straccia, U.: A general framework for representing, reasoning and querying with annotated semantic web data. JWS 11, 72–95 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Vienna University of Economics and BusinessViennaAustria
  2. 2.Complexity Science Hub ViennaViennaAustria
  3. 3.Department of Computer ScienceUniversidad de ValladolidValladolidSpain
  4. 4.Stanford UniversityStanfordUSA

Personalised recommendations