Skip to main content

Federated Data Management and Query Optimization for Linked Open Data

  • Chapter
New Directions in Web Data Management 1

Part of the book series: Studies in Computational Intelligence ((SCI,volume 331))

Abstract

Linked Open Data provides data on the web in a machine readable way with typed links between related entities. Means of accessing Linked Open Data include crawling, searching, and querying. Search in Linked Open Data allows for more than just keyword-based, document-oriented data retrieval. Only complex queries across different data source can leverage the full potential of Linked Open Data. In this sense Linked Open Data is more similar to distributed/federated databases, but with less cooperation between the data sources, which are maintained independently and may update their data without notice. Since Linked Open Data is based on standards like the RDF format and the SPARQL query language, it is possible to implement a federation infrastructure without the need for specific data wrappers. However, some design issues of the current SPARQL standard limit the efficiency and applicability of query execution strategies. In this chapter we consider some details and implications of these limitations and presents an improved query optimization approach based on dynamic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: Using the Barton libraries dataset as an RDF benchmark. Tech. rep., Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (2007)

    Google Scholar 

  2. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing Linked Datasets – On the Design and Usage of voiD, the “Vocabulary Of Interlinked Datasets”. In: Proceedings of the Linked Data on the Web Workshop. CEUR Workshop Proceedings, Madrid, Spain (2009); ISSN 1613-0073

    Google Scholar 

  3. Atre, M., Chaoji, V., Zaki, M., Hendler, J.: Matrix “Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In: Proceedings of the 19th International World Wide Web Conference, Raleigh, NC, USA, pp. 41–50 (2010)

    Google Scholar 

  4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: Proceedings of the 6th International Semantic Web Conference, Busan, Korea, pp. 722–735 (2007)

    Google Scholar 

  5. Berners-Lee, T.: Linked Data Design Issues, http://www.w3.org/DesignIssues/LinkedData.html

  6. Bernstein, P., Chiu, D.: Using Semi-Joins to Solve Relational Queries. Journal of the ACM 28(1), 25–40 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  7. Bizer, C., Cyganiak, R.: D2R Server – Publishing Relational Databases on the Semantic Web, http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/

  8. Bizer, C., Heath, T., Berners-Lee, T.: Linked Data – The Story So Far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)

    Google Scholar 

  9. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – A Crystallization Point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  10. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  11. Breslin, J., Decker, S., Harth, A., Bojars, U.: SIOC: an approach to connect web-based communities. International Journal of Web Based Communities 2(2), 133–142 (2006)

    Google Scholar 

  12. Brickley, D., Miller, L.: FOAF Vocabulary Specification 0.97, Namespace Document (January 1, 2010), http://xmlns.com/foaf/spec/

  13. Carroll, J., Bizer, C., Hayes, P., Stickler, P.: Named graphs. Web Semantics: Science, Services and Agents on the World Wide Web 3(4), 247–267 (2005)

    Article  Google Scholar 

  14. Cheng, G., Qu, Y.: Searching Linked Objects with Falcons: Approach, Implementation and Evaluation. International Journal on Semantic Web and Information Systems 5(3), 49–70 (2009)

    Google Scholar 

  15. Clark, K.G., Feigenbaum, L., Torres, E.: SPARQL Protocol for RDF, W3C Recommendation (January 15, 2008), http://www.w3.org/TR/rdf-sparql-protocol/

  16. D’ Aquin, M., Baldassarre, C., Gridinoc, L., Angeletou, S., Sabou, M., Motta, E.: Characterizing Knowledge on the Semantic Web with Watson. In: Proceedings of the 5th International Workshop on Evaluation of Ontologies and Ontology-based Tools (EON), Busan, Korea, pp. 1–10 (2007)

    Google Scholar 

  17. Erling, O., Mikhailov, I.: RDF Support in the Virtuoso DBMS. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds.) Networked Knowledge - Networked Media, pp. 7–24. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Flesca, S., Furfaro, F., Pugliese, A.: A Framework for the Partial Evaluation of SPARQL Queries. In: Proceedings of the 2nd International Conference on Scalable Uncertainty Management, Naples, Italy, pp. 201–214 (2008)

    Google Scholar 

  19. Franz, T., Schultz, A., Sizov, S., Staab, S.: TripleRank: Ranking SemanticWeb Data By Tensor Decomposition. In: Proceedings of the 8th International Semantic Web Conference, Chantilly, VA, USA, pp. 213–228 (2009)

    Google Scholar 

  20. Görlitz, O., Sizov, S., Staab, S.: PINTS: Peer-to-Peer Infrastructure for Tagging Systems. In: Proceedings of the 7th International Workshop on Peer-to-Peer Systems (IPTPS), Tampa Bay, Florida, USA (2008)

    Google Scholar 

  21. Gueret, C., Oren, E., Schlobach, S., Schut, M.: An Evolutionary Perspective on Approximate RDF Query Answering. In: Proceedings of the 2nd International Conference on Scalable Uncertainty Management, Naples, Italy, pp. 215–228 (2008)

    Google Scholar 

  22. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3), 158–182 (2005)

    Article  Google Scholar 

  23. Haas, L., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing Queries across Diverse Data Sources. In: Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, pp. 276–285 (1997)

    Google Scholar 

  24. Harris, S., Lamb, N., Shadbolt, N.: 4store: The Design and Implementation of a Clustered RDF Store. In: Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2009), Chantilly, VA, USA, pp. 94–109 (2009)

    Google Scholar 

  25. Harris, S., Seaborne, A.: SPARQL Query Language 1.1, W3C Working Draft (January 26, 2010), http://www.w3.org/TR/sparql11-query/

  26. Harth, A., Hogan, A., Delbru, R., Umbrich, J., O’Riain, S., Decker, S.: SWSE: Answers Before Links! In: Proceedings of Semantic Web Challenge (2007)

    Google Scholar 

  27. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.U., Umbrich, J.: Data Summaries for On-Demand Queries over Linked Data. In: Proceedings of the 19th International World Wide Web Conference, Raleigh, NC, USA, pp. 411–420 (2010)

    Google Scholar 

  28. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data From The Web. In: Proceedings of the 6th International Semantic Web Conference, Busan, Korea, pp. 211–224 (2007)

    Google Scholar 

  29. Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL Queries over the Web of Linked Data. In: Proceedings of the 8th International Semantic Web Conference, Chantilly, VA, USA, pp. 293–309 (2009)

    Google Scholar 

  30. Heimbigner, D., McLeod, D.: A Federated Architecture for Information Management. ACM Transactions on Information Systems 3(3), 253–278 (1985)

    Article  Google Scholar 

  31. Hogenboom, A., Milea, V., Frasincar, F., Kaymak, U.: RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms. In: Proceedings of the 10th International Conference on E-Commerce and Web Technologies, Linz, Austria, pp. 181–192 (2009)

    Google Scholar 

  32. Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: A New Flavor of Federated Query Processing for DB2. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, pp. 524–532 (2002)

    Google Scholar 

  33. Kossmann, D.: The State of the Art in Distributed Query Processing. ACM Computing Surveys 32(4), 422–469 (2000)

    Article  Google Scholar 

  34. Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Transactions on Database Systems (TODS) 25(1), 43–82 (2000)

    Article  Google Scholar 

  35. Langegger, A., Wöß, W., Blöchl, M.: A Semantic Web Middleware for Virtual Data Integration on the Web. In: Proceedings of the 5th European Semantic Web Conference, Tenerife, Canary Islands, Spain, pp. 493–507 (2008)

    Google Scholar 

  36. Maduko, A., Anyanwu, K., Sheth, A., Schliekelman, P.: Graph Summaries for Subgraph Frequency Estimation. In: Proceedings of the 5th European Semantic Web Conference, Tenerife, Canary Islands, Spain (2008)

    Google Scholar 

  37. Manola, F., Miller, E.: RDF Primer, W3C Recommendation (February 10, 2004), http://www.w3.org/TR/rdf-primer/

  38. Miles, A., Matthews, B., Wilson, M., Brickley, D.: SKOS Core: Simple Knowledge Organisation for the Web. In: Proceedings of the 3rd European Semantic Web Conference, Budva, Montenegro, pp. 95–109 (2006)

    Google Scholar 

  39. Moerkotte, G., Neumann, T.: Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 930–941 (2006)

    Google Scholar 

  40. Muralikrishna, M., DeWitt, D.: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. In: Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, pp. 28–36. ACM Press, Chicago (1988)

    Chapter  Google Scholar 

  41. Neumann, T., Weikum, G.: RDF-3X: a RISC-style Engine for RDF. In: Proceedings of the 34th International Conference on Very Large Data Bases, Auckland, New Zealand, pp. 647–659 (2008)

    Google Scholar 

  42. Neumann, T., Weikum, G.: Scalable Join Processing on Very Large RDF Graphs. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence, RI, USA, pp. 627–640 (2009)

    Google Scholar 

  43. Ning, X., Jin, H., Wu, H.: RSS: A framework enabling ranked search on the semantic web. Information Processing and Management 44(2), 893–909 (2007)

    Article  Google Scholar 

  44. Ntarmos, N., Triantafillou, P., Weikum, G.: Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks. In: Proceedings of the 22nd International Conference on Data Engineering, Atlanta, Georgia, USA (2006)

    Google Scholar 

  45. Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: A Document-oriented Lookup Index for Open Linked Data. International Journal of Metadata, Semantics and Ontologies 3(1), 37–52 (2008)

    Article  Google Scholar 

  46. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems 34(3), 1–45 (2009)

    Article  Google Scholar 

  47. Prud’hommeaux, E.: SPARQL Federation Extensions 1.1, Editor’s Draft (March 25, 2010), http://www.w3.org/2009/sparql/docs/fed/service

  48. Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation (January 15, 2008), http://www.w3.org/TR/rdf-sparql-query/

  49. Quilitz, B., Leser, U.: Querying Distributed RDF Data Sources with SPARQL. In: Proceedings of the 5th European Semantic Web Conference, Tenerife, Canary Islands, Spain, pp. 524–538 (2008)

    Google Scholar 

  50. Schenk, S., Saathoff, C., Staab, S., Scherp, A.: SemaPlorer – Interactive Semantic Exploration of Data and Media based on a Federated Cloud Infrastructure. Journal on Web Semantics: Science, Services and Agents on the World Wide Web 7(4), 298–304 (2009)

    Article  Google Scholar 

  51. Schenk, S., Staab, S.: Networked Graphs: A Declarative Mechanism for SPARQL Rules, SPARQL Views and RDF Data Integration on the Web. In: Proceeding of the 17th International World Wide Web Conference, Beijing, China, pp. 585–594 (2008)

    Google Scholar 

  52. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL Performance Benchmark. In: Proceedings of the 25th International Conference on Data Engineering, Shanghai, pp. 222–233 (2009)

    Google Scholar 

  53. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL Query Optimization (2008); Arxiv preprint arXiv:0812.3788

    Google Scholar 

  54. Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access Path Selection in a Relational Database Management System. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Boston, MA, USA, pp. 23–34 (1979)

    Google Scholar 

  55. Sheth, A., Larson, J.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys 22(3), 183–236 (1990)

    Article  Google Scholar 

  56. Stuckenschmidt, H., Vdovjak, R., Houben, G.J., Broekstra, J.: Index Structures and Algorithms for Querying Distributed RDF Repositories. In: Proceedings of the 13th International World Wide Web Conference, New York, NY, USA, pp. 631–639 (2004)

    Google Scholar 

  57. Tomasic, A., Raschid, L., Valduriez, P.: Scaling Heterogeneous Databases and the Design of Disco. In: Proceedings of the 16th International Conference on Distributed Computing Systems, Hong Kong, pp. 449–457 (1996)

    Google Scholar 

  58. Tran, T., Haase, P., Studer, R.: Semantic Search – Using Graph-Structured Semantic Models for Supporting the Search Process. In: Proceedings of the 17th International Conference on Conceptual Structures, Moscow, Russia, pp. 48–65 (2009)

    Google Scholar 

  59. Tran, T., Wang, H., Haase, P.: Hermes: Data Web search on a pay-as-you-go integration infrastructure. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 189–203 (2009)

    Article  Google Scholar 

  60. Weiss, C., Karras, P., Bernstein, A.: Hexastore: Sextuple Indexing for Semantic Web Data Management. In: Proceedings of the 34th International Conference on Very Large Data Bases, Auckland, New Zealand, pp. 1008–1019 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Görlitz, O., Staab, S. (2011). Federated Data Management and Query Optimization for Linked Open Data. In: Vakali, A., Jain, L.C. (eds) New Directions in Web Data Management 1. Studies in Computational Intelligence, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17551-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17551-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17550-3

  • Online ISBN: 978-3-642-17551-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics