Linked Data Management

  • Manfred Hauswirth
  • Marcin Wylot
  • Martin Grund
  • Paul Groth
  • Philippe Cudré-Mauroux
Chapter

Abstract

The size of Linked Data is growing exponentially, thus a Linked Data management system has to be able to deal with increasing amounts of data. Additionally, in the Linked Data context, variety is especially important. In spite of its seemingly simple data model, Linked Data actually encodes rich and complex graphs mixing both instance and schema-level data. Since Linked Data is schema-free (i.e., the schema is not strict), standard databases techniques cannot be directly adopted to manage it. Even though organizing Linked Data in a form of a table is possible, querying a giant triple table becomes very costly due to the multiple nested joins required typical queries. The heterogeneity of Linked Data poses also entirely new challenges to database systems, where managing provenance information is becoming a requirement. Linked Data queries usually include multiple sources and results can be produced in various ways for a specific scenario. Such heterogeneous data can incorporate knowledge on provenance, which can be further leveraged to provide users with a reliable and understandable description of the way the query result was derived, and improve the query execution performance due to high selectivity of provenance information. In this chapter, we provide a detailed overview of current approaches specifically designed for Linked Data management. We focus on storage models, indexing techniques, and query execution strategies. Finally, we provide an overview of provenance models, definitions, and serialization techniques for Linked Data. We also survey the database management systems implementing techniques to manage provenance information in the context of Linked Data.

References

  1. 1.
    D.J. Abadi, A. Marcus, S. Madden, K.J. Hollenbach, Scalable semantic web data management using vertical partitioning, in Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007 (ACM, 2007), pp. 411–422Google Scholar
  2. 2.
    K. Alexander, M. Hausenblas, Describing linked datasets — on the design and usage of void, the vocabulary of interlinked datasets, in In Linked Data on the Web Workshop (LDOW 09), in Conjunction with 18th International World Wide Web Conference (WWW 09) (2009). http://richard.cyganiak.de/2008/papers/void-ldow2009.pdf
  3. 3.
    M. Atre, V. Chaoji, M.J. Zaki, J.A. Hendler, Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data, in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010 (ACM, 2010), pp. 41–50Google Scholar
  4. 4.
    M. Atre, J.A. Hendler, BitMat: a main memory bit-matrix of RDF triples, in The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (Citeseer, 2009), p. 33Google Scholar
  5. 5.
    S. Auer, J. Demter, M. Martin, J. Lehmann, Lodstats-an extensible framework for high-performance dataset analytics, in Knowledge Engineering and Knowledge Management (Springer, Berlin, 2012), pp. 353–362Google Scholar
  6. 6.
    T. Berners-Lee, Linked data-design issues (2006)Google Scholar
  7. 7.
    T. Berners-Lee, J. Hendler, O. Lassila et al., The semantic web. Sci. Am. 284(5), 28–37 (2001)CrossRefGoogle Scholar
  8. 8.
    O. Biton, S. Cohen-Boulakia, S.B. Davidson, Zoom*userviews: querying relevant provenance in workflow systems, in Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07 (VLDB Endowment, 2007), pp. 1366–1369Google Scholar
  9. 9.
    C. Bizer, A. Jentzsch, R. Cyganiak, State of the lod cloud. Version 0.3 (September 2011) 1803 (2011). http://lod-cloud.net/state/
  10. 10.
    M. Bröcheler, A. Pugliese, V. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDf databases, in The Semantic Web-ISWC 2009 (Springer, Berlin, 2009), pp. 97–113Google Scholar
  11. 11.
    M. Bröcheler, A. Pugliese, V.S. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in Proceedings of the Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009 (Springer, Berlin, 2009), pp. 97–113Google Scholar
  12. 12.
    J.J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, provenance and trust, in Proceedings of the 14th International Conference on World Wide Web (ACM, 2005), pp. 613–622Google Scholar
  13. 13.
    A. Chebotko, S. Lu, X. Fei, F. Fotouhi, RDFProv: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl. Eng. 69(8), 836–865 (2010)CrossRefGoogle Scholar
  14. 14.
    J. Cheney, L. Chiticariu, W.C. Tan, Provenance in Databases: Why, How, and Where (Now Publishers Inc., Breda, 2009)Google Scholar
  15. 15.
    P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A.J. Gray, C. Goble, T. Clark, Pav ontology: provenance, authoring and versioning. J. Biomed. Semant. 4(1), 1–22 (2013). doi:10.1186/2041-1480-4-37 CrossRefGoogle Scholar
  16. 16.
    P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, T. Clark, The swan biomedical discourse ontology. J. Biomed. Inf. 41(5), 739–751 (2008). doi:10.1016/j.jbi.2008.04.010 CrossRefGoogle Scholar
  17. 17.
    Consortium WWW, OWL 2 Web Ontology Language (2012)Google Scholar
  18. 18.
    Consortium WWW, SPARQL 1.1 Overview (2013)Google Scholar
  19. 19.
    Consortium WWW, RDF 1.1 Concepts and Abstract Syntax (2014)Google Scholar
  20. 20.
    Consortium WWW, RDF 1.1: On Semantics of RDF Datasets (2014)Google Scholar
  21. 21.
    Consortium WWW, RDF 1.1 Primer (2014)Google Scholar
  22. 22.
    Consortium WWW, RDF Schema 1, 1 (2014)Google Scholar
  23. 23.
    P. Cudré-Mauroux, H. Kimura, K.T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D.L. Wang, M. Balazinska, J. Becla, D.J. DeWitt, B. Heath, D. Maier, S. Madden, J.M. Patel, M. Stonebraker, S.B. Zdonik, A demonstration of SciDB: a science-oriented DBMS. PVLDB 2(2), 1534–1537 (2009)Google Scholar
  24. 24.
    P. Cudré-Mauroux, E. Wu, S. Madden, The case for rodentstore, an adaptive, declarative storage system, in Biennial Conference on Innovative Data Systems Research (CIDR) (2009)Google Scholar
  25. 25.
    P.P. da Silva, D.L. McGuinness, R. Fikes, A proof markup language for semantic web services. Inf. Syst. 31(4), 381–395 (2006). doi:10.1016/j.is.2005.02.003 CrossRefGoogle Scholar
  26. 26.
    C.V. Damásio, A. Analyti, G. Antoniou, Provenance for sparql queries, in Proceedings of the 11th International Conference on The Semantic Web - Volume Part I, ISWC’12 (Springer, Berlin, 2012), pp. 625–640. doi:10.1007/978-3-642-35176-1_39
  27. 27.
    L. Ding, Y. Peng, P.P. da Silva, D.L. McGuinness, Tracking RDF graph provenance using RDF molecules, in International Semantic Web Conference (2005)Google Scholar
  28. 28.
    O. Erling, I. Mikhailov, Towards web scale RDF, in Proceedings of the SSWS (2008)Google Scholar
  29. 29.
    G.H.L. Fletcher, P.W. Beck, Scalable indexing of RDF graphs for efficient join processing, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2–6, 2009 (ACM, 2009), pp. 1513–1516Google Scholar
  30. 30.
    G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, V. Christophides, Coloring RDF triples to capture provenance, in Proceedings of the 8th International Semantic Web Conference, ISWC ’09 (Springer, Berlin, 2009), pp. 196–212. doi:10.1007/978-3-642-04930-9_13
  31. 31.
    H. Garcia-Molina, Database Systems: The Complete Book (Pearson Education, India, 2008)Google Scholar
  32. 32.
    F. Geerts, G. Karvounarakis, V. Christophides, I. Fundulaki, Algebraic structures for capturing the provenance of sparql queries, in Proceedings of the 16th International Conference on Database Theory, ICDT ’13 (ACM, New York, 2013), pp. 153–164. doi:10.1145/2448496.2448516
  33. 33.
    Y. Gil, S. Miles, K. Belhajjame, H. Deus, D. Garijo, G. Klyne, P. Missier, S. Soiland-Reyes, S. Zednik (eds.), in PROV model primer. W3C Working Group Note NOTE-prov-primer-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/prov-primer/
  34. 34.
    B. Glavic, G. Alonso, The perm provenance management system in action, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09 (ACM, New York, NY, USA, 2009), pp. 1055–1058Google Scholar
  35. 35.
    A.J. Gray, Dataset descriptions for linked data systems. IEEE Internet Comput. 18(4), 66–69 (2014). doi:10.1109/MIC.2014.66 CrossRefGoogle Scholar
  36. 36.
    T.J. Green, G. Karvounarakis, V. Tannen, Provenance semirings, in Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM, 2007), pp. 31–40Google Scholar
  37. 37.
    P. Groth, A. Gibson, J. Velterop, The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). http://dl.acm.org/citation.cfm?id=1883685.1883690
  38. 38.
    P. Groth, Y. Gil, J. Cheney, S. Miles, Requirements for provenance on the web. Int. J. Digit. Curation 7(1), 39–56 (2012). doi:10.2218/ijdc.v7i1.213 CrossRefGoogle Scholar
  39. 39.
    P. Groth, L. Moreau (eds.), PROV-overview. An overview of the PROV family of documents, in W3C Working Group Note NOTE-Prov-Overview-20130430, World Wide Web Consortium (2013). http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
  40. 40.
    S. Harris, N. Lamb, N. Shadbolt, 4store: the design and implementation of a clustered rdf store, in 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009) (2009), pp. 94–109Google Scholar
  41. 41.
    A. Harth, S. Decker, Optimized index structures for querying RDF from the web, in IEEE LA-WEB (2005), pp. 71–80Google Scholar
  42. 42.
    O. Hartig, Provenance information in the web of data, in LDOW (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf
  43. 43.
    O. Hartig, Querying trust in RDF data with tSPARQL, in Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion (Springer, Berlin, 2009), pp. 5–20. doi:10.1007/978-3-642-02121-3_5
  44. 44.
    P. Hayes, B. McBride, RDF semantics, in W3C Recommendation (2004)Google Scholar
  45. 45.
    T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool (Morgan & Claypool Publishers, 2011). doi:10.2200/S00334ED1V01Y201102WBE001
  46. 46.
    T. Heath, C. Bizer, Linked data: evolving the web into a global data space. Synth. Lectures Semant. Web: Theory technol. 1(1), 1–136 (2011)CrossRefGoogle Scholar
  47. 47.
    J.M. Hellerstein, M. Stonebraker, Readings in Database Systems (MIT Press, Cambridge, 2005)Google Scholar
  48. 48.
    J. Hoffart, F.M. Suchanek, K. Berberich, G. Weikum, YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194(0), 28–61 (2013). doi:10.1016/j.artint.2012.06.001, http://www.sciencedirect.com/science/article/pii/S0004370212000719 (Artificial Intelligence, Wikipedia and Semi-Structured Resources)
  49. 49.
    J. Huang, D.J. Abadi, K. Ren, Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  50. 50.
    M. Janik, K. Kochut, BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery, in Proceedings of the The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6–10, 2005 (Springer, Berlin, 2005), pp. 431–445Google Scholar
  51. 51.
    G. Karvounarakis, Z.G. Ives, V. Tannen, Querying data provenance, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, 2010), pp. 951–962Google Scholar
  52. 52.
    G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)MathSciNetCrossRefMATHGoogle Scholar
  53. 53.
    S. Miles, Electronically querying for the provenance of entities, in Provenance and Annotation of Data, vol. 4145, ed. by L. Moreau, I. Foster. Lecture Notes in Computer Science (Springer, Berlin, 2006), pp. 184–192. doi:10.1007/11890850_19
  54. 54.
    M. Luc, G. Paul, Provenance: An Introduction to PROV (Morgan and Claypool, 2013). http://eprints.soton.ac.uk/356858/
  55. 55.
    L. Moreau, The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010). doi:10.1561/1800000010, http://eprints.ecs.soton.ac.uk/21691/
  56. 56.
    L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, J.V. den Bussche, The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011). doi:10.1016/j.future.2010.07.005, http://www.sciencedirect.com/science/article/pii/S0167739X10001275
  57. 57.
    L. Moreau, P. Groth, J. Cheney, T. Lebo, S. Miles, The rationale of PROV. Web Semant.: Sci. Serv. Agents World Wide Web 35, Part 4, 235–257 (2015). http://dx.doi.org/10.1016/j.websem.2015.04.001, http://www.sciencedirect.com/science/article/pii/S1570826815000177
  58. 58.
    T. Neumann, G. Weikum, RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. (PVLDB) 1(1), 647–659 (2008)CrossRefGoogle Scholar
  59. 59.
    T. Neumann, G. Weikum, The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRefGoogle Scholar
  60. 60.
    V. Nguyen, O. Bodenreider, A. Sheth, Don’t like RDF reification? Making statements about statements using singleton property, in Proceedings of the 23rd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2014), pp. 759–770Google Scholar
  61. 61.
    X. Niu, R. Kapoor, B. Glavic, D. Gawlick, Z.H. Liu, V. Krishnaswamy, V. Radhakrishnan, Interoperability for provenance-aware databases using PROV and JSON, in Proceedings of the 7th USENIX Conference on Theory and Practice of Provenance, TaPP’15 (USENIX Association, Berkeley, CA, USA, 2015), p. 6. http://dl.acm.org/citation.cfm?id=2814579.2814585
  62. 62.
    A. Owens, A. Seaborne, N. Gibbins, et al., Clustered TDB: a clustered triple store for jena (2008)Google Scholar
  63. 63.
    E. Prud’Hommeaux, A. Seaborne, et al., Sparql query language for RDF. W3C Recommendation (2008)Google Scholar
  64. 64.
    S.S. Sahoo, A. Sheth, Provenir ontology: towards a framework for escience provenance management, in Microsoft eScience Workshop (2009). http://knoesis.wright.edu/library/resource.php?id=741
  65. 65.
    M. Schmachtenberg, C. Bizer, H. Paulheim, Adoption of the linked data best practices in different topical domains, in The Semantic Web–ISWC 2014 (Springer, 2014), pp. 245–260Google Scholar
  66. 66.
    Y. Theoharis, I. Fundulaki, G. Karvounarakis, V. Christophides, On provenance of queries on semantic web data. IEEE Internet Comput. 15(1), 31–39 (2011). doi:10.1109/MIC.2010.127 CrossRefGoogle Scholar
  67. 67.
    O. Udrea, D.R. Recupero, V. Subrahmanian, Annotated RDF. ACM Trans. Comput. Log. (TOCL) 11(2), 10 (2010)MathSciNetMATHGoogle Scholar
  68. 68.
    Y.R. Wang, S.E. Madnick, A polygon model for heterogeneous database systems: the source tagging perspective, in Proceedings of the Sixteenth International Conference on Very Large Databases (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990), pp. 519–533. http://dl.acm.org/citation.cfm?id=94362.94604
  69. 69.
    C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. (PVLDB) 1(1), 1008–1019 (2008). http://doi.acm.org/10.1145/1453856.1453965
  70. 70.
    M. Wylot, Efficient, scalable, and provenance-aware management of linked data. Ph.D. thesis, University of Fribourg (Switzerland) (2015)Google Scholar
  71. 71.
    M. Wylot, P.C. Mauroux, Diplocloud: Efficient and Scalable Management of RDF Data in the Cloud (2015)Google Scholar
  72. 72.
    M. Wylot, J. Pont, M. Wisniewski, P. Cudré-Mauroux, dipLODocus[RDF] - short and long-tail RDF analytics for massive webs of data, in International Semantic Web Conference (2011), pp. 778–793Google Scholar
  73. 73.
    M. Wylot, P. Cudre-Mauroux, P. Groth, TripleProv: efficient processing of lineage queries in a native RDF store, in Proceedings of the 23rd International Conference on World Wide Web, WWW ’14. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014), pp. 455–466Google Scholar
  74. 74.
    M. Wylot, P. Cudré-Mauroux, P. Groth, A demonstration of tripleprov: tracking and querying provenance over web data. Proc. VLDB Endow. 8(12), 1992–1995 (2015)CrossRefGoogle Scholar
  75. 75.
    M. Wylot, P. Cudré-Mauroux, P. Groth, Executing provenance-enabled queries over web data, in Proceedings of the 24rd International Conference on World Wide Web, WWW ’15. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015)Google Scholar
  76. 76.
    J. Zhao, Guide to the Open Provenance Model Vocabulary (2010). http://open-biomed.sourceforge.net/opmv/opmv-guide.html
  77. 77.
    J. Zhao, C. Bizer, Y. Gil, P. Missier, S. Sahoo, Provenance requirements for the next version of RDF, in W3C Workshop RDF Next Steps (2010)Google Scholar
  78. 78.
    A. Zimmermann, N. Lopes, A. Polleres, U. Straccia, A general framework for representing, reasoning and querying with annotated semantic web data. Web Semant. 11, 72–95 (2012). doi:10.1016/j.websem.2011.08.006
  79. 79.
    L. Zou, J. Mo, L. Chen, M.T. Oezsu, D. Zhao, gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Manfred Hauswirth
    • 1
  • Marcin Wylot
    • 1
  • Martin Grund
    • 1
  • Paul Groth
    • 1
  • Philippe Cudré-Mauroux
    • 1
  1. 1.Technical University of Berlin (TU Berlin)BerlinGermany

Personalised recommendations