Compact Representation of Large RDF Data Sets for Publishing and Exchange

  • Javier D. Fernández
  • Miguel A. Martínez-Prieto
  • Claudio Gutierrez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6496)

Abstract

Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisible structure. All this leads to fuzzy publications, inefficient management, complex processing and lack of scalability. This paper presents a novel RDF representation (HDT) which takes advantage of the structural properties of RDF graphs for splitting and representing, efficiently, three components of RDF data: Header, Dictionary and Triples structure. On-demand management operations can be implemented on top of HDT representation. Experiments show that data sets can be compacted in HDT by more than fifteen times the current naive representation, improving parsing and processing while keeping a consistent publication scheme. For exchanging, specific compression techniques over HDT improve current compression solutions.

References

  1. 1.
    Alexander, K.: RDF in JSON: A Specification for serialising RDF in JSON. In: SFSW 2008 (2008), http://www.semanticscripting.org/SFSW2008 (retrieved September 2010)
  2. 2.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix ”Bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW 2010, pp. 41–50 (2010)Google Scholar
  3. 3.
    Beckett, D.: RDF/XML syntax specification (Revised). Technical report, W3C (February 2004)Google Scholar
  4. 4.
    Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: WWW 2004, pp. 595–602 (2004)Google Scholar
  5. 5.
    Brickley, D.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recomm. (2004), http://www.w3.org/TR/rdf-schema/ (retrieved September 2010)
  6. 6.
    Chierichetti, F., Kumar, R., Raghavan, P.: Compressed web indexes. In: WWW 2009, pp. 451–460 (2009)Google Scholar
  7. 7.
    Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient sql-based rdf querying scheme. In: VLDB 2005, pp. 1216–1227 (2005)Google Scholar
  8. 8.
    Clark, D.: Compact PAT trees. PhD thesis, University of Waterloo (1996)Google Scholar
  9. 9.
    Cleary, J.G., Witten, I.H.: Data Compression Using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications 32(4), 396–402 (1984)CrossRefGoogle Scholar
  10. 10.
    Ding, L., Finin, T.: Characterizing the Semantic Web on the Web. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 242–257. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: WWW 2010, pp. 1091–1092 (2010)Google Scholar
  12. 12.
    González, R., Grabowski, S., Makinen, V., Navarro, G.: Practical implementation of rank and select queries. In: WEA 2005, pp. 27–38 (2005)Google Scholar
  13. 13.
    Gutierrez, C., Hurtado, C., Mendelzon, A.O.: Foundations of semantic web databases. In: PODS 2004, pp. 95–106 (2004)Google Scholar
  14. 14.
    Huffman, D.A.: A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40(9), 1098–1101 (1952)CrossRefMATHGoogle Scholar
  15. 15.
    IBM. IBM Dictionary of Computing. McGraw-Hill, New York (1993)Google Scholar
  16. 16.
    McGuinness, D., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation (2004), http://www.w3.org/TR/owl-features/ (retrieved September 2010)
  17. 17.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)CrossRefGoogle Scholar
  18. 18.
    Neumann, T., Weikum, G.: Scalable join processing on very large rdf graphs. In: COMAD 2009, pp. 627–640 (2009)Google Scholar
  19. 19.
    Oren, E., et al.: Sindice.com: a document-oriented lookup index for open linked data. International Journal of Metadata, Semantics and Ontologies 3(1), 37–52 (2008)CrossRefGoogle Scholar
  20. 20.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3), 1–45 (2009)CrossRefGoogle Scholar
  21. 21.
    Theoharis, Y., Tzitzikas, Y., Kotzinos, D., Christophides, V.: On Graph Features of Semantic Web Schemas. IEEE Trans. on Know. and Data Engineering 20(5), 692–702 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Javier D. Fernández
    • 1
  • Miguel A. Martínez-Prieto
    • 1
    • 2
  • Claudio Gutierrez
    • 2
  1. 1.Department of Computer ScienceUniversidad de ValladolidSpain
  2. 2.Department of Computer ScienceUniversidad de ChileChile

Personalised recommendations