Lightweighting the Web of Data through Compact RDF/HDT

  • Javier D. Fernández
  • Miguel A. Martínez-Prieto
  • Mario Arias
  • Claudio Gutierrez
  • Sandra Álvarez-García
  • Nieves R. Brisaboa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7023)

Abstract

The Web of Data is producing large RDF datasets from diverse fields. The increasing size of the data being published threatens to make these datasets hardly to exchange, index and consume. This scalability problem greatly diminishes the potential of interconnected RDF graphs. The HDT format addresses these problems through a compact RDF representation, that partitions and efficiently represents three components: Header (metadata), Dictionary (strings occurring in the dataset), and Triples (graph structure). This paper revisits the format and exploits the latest findings in triples indexing for querying, exchanging and visualizing RDF information at large scale.

Keywords

Adjacency Matrix SPARQL Query Triple Pattern SPARQL Endpoint Link Open Data Cloud 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Álvarez, S., Brisaboa, N., Ladra, S., Pedreira, O.: A Compact Representation of Graph Databases. In: Proc. of MLG, pp. 18–25 (2010)Google Scholar
  2. 2.
    Álvarez García, S., Brisaboa, N., Fernández, J.D., Martínez-Prieto, M.A.: Compressed k2-Triples for Full-In-Memory RDF Engines. In: Proc. of AMCIS, TBP (2011)Google Scholar
  3. 3.
    Arias, M., Fernández, J.D., Martínez-Prieto, M.A.: RDF Visualization using a Three-Dimensional Adjacency Matrix. In: Proc. of SemSearch (2011), http://km.aifb.kit.edu/ws/semsearch11/8.pdf
  4. 4.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: Proc of WWW, pp. 41–50 (2010)Google Scholar
  5. 5.
    Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked Data On the Web (LDOW 2008). In: Proc. of WWW, pp. 1265–1266 (2008)Google Scholar
  6. 6.
    Brisaboa, N.R., Ladra, S., Navarro, G.: k2-Trees for Compact Web Graph Representation. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 18–30. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Claude, F., Fariña, A., Martínez-Prieto, M.A., Navarro, G.: Compressed q-gram indexing for highly repetitive biological sequences. In: Proc. of BIBE, pp. 86–91 (2010)Google Scholar
  8. 8.
    Dokulil, J., Katreniakova, J.: RDF Visualization - Thinking Big. In: Proc. DEXA, pp. 459–463 (2009)Google Scholar
  9. 9.
    Fekete, J.: Visualizing networks using adjacency matrices: Progresses and challenges. In: Proc. of CAD/GRAPHICS 2009, pp. 636–638 (2009)Google Scholar
  10. 10.
    Fernández, J.D., Martínez-Prieto, M.A., Gutierrez, C.: Compact Representation of Large RDF Data Sets for Publishing and Exchange. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 193–208. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    González, R., Grabowski, S., Makinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Proc. of WEA, pp. 27–38 (2005)Google Scholar
  12. 12.
    Hartig, O., Bizer, C., Freytag, J.-C.: Executing SPARQL Queries over the Web of Linked Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the Pedantic Web. In: Proc. of LDOW (2010)Google Scholar
  14. 14.
    Navarro, G., Mäkinen, V.: Compressed Full-Text Indexes. ACM Computing Surveys 39(1), article 2 (2007)Google Scholar
  15. 15.
    Neumann, T., Weikum, G.: The RDF-3X Engine for Scalable Management of RDF data. The VLDB Journal 19(1), 91–113 (2010)CrossRefGoogle Scholar
  16. 16.
    Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: a document-oriented lookup index for open linked data. International Journal of Metadata Semantics and Ontologies 3(1), 37 (2008)CrossRefGoogle Scholar
  17. 17.
    Quilitz, B., Leser, U.: Querying Distributed RDF Data Sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL Performance Benchmark. In: Proc. of ICDE, pp. 222–233 (2009)Google Scholar
  19. 19.
    Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL Query Optimization. In: Proc. of ICDT, pp. 4–33 (2010)Google Scholar
  20. 20.
    Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys 22(3), 183–236 (1990)CrossRefGoogle Scholar
  21. 21.
    Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store Support for RDF Data Management: not All Swans are White. Proc. of the VLDB Endowment 1(2), 1553–1563 (2008)CrossRefGoogle Scholar
  22. 22.
    Theoharis, Y., Tzitzikas, Y., Kotzinos, D., Christophides, V.: On Graph Features of Semantic Web Schemas. IEEE Trans. on Know. and Data Engineering 20(5), 692–702 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Javier D. Fernández
    • 1
  • Miguel A. Martínez-Prieto
    • 1
    • 2
  • Mario Arias
    • 1
  • Claudio Gutierrez
    • 2
  • Sandra Álvarez-García
    • 3
  • Nieves R. Brisaboa
    • 3
  1. 1.Universidad de ValladolidEspañaSpain
  2. 2.Universidad de ChileChile
  3. 3.Universidade da CoruñaEspañaSpain

Personalised recommendations