Querying Distributed RDF Data Sources with SPARQL

  • Bastian Quilitz
  • Ulf Leser
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5021)

Abstract

Integrated access to multiple distributed and autonomous RDF data sources is a key challenge for many semantic web applications. As a reaction to this challenge, SPARQL, the W3C Recommendation for an RDF query language, supports querying of multiple RDF graphs. However, the current standard does not provide transparent query federation, which makes query formulation hard and lengthy. Furthermore, current implementations of SPARQL load all RDF graphs mentioned in a query to the local machine. This usually incurs a large overhead in network traffic, and sometimes is simply impossible for technical or legal reasons. To overcome these problems we present DARQ, an engine for federated SPARQL queries. DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution. Experiments show that these optimizations significantly improve query performance even when only a very limited amount of statistical information is available. DARQ is available under GPL License at http://darq.sf.net/.

References

  1. 1.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C Recommendation (January 2008), http://www.w3.org/TR/rdf-sparql-query/
  2. 2.
    Florescu, D., Levy, A., Manolescu, I., Suciu, D.: Query optimization in the presence of limited access patterns. In: International conference on Management of data (SIGMOD), pp. 311–322. ACM, New York (1999)Google Scholar
  3. 3.
    Busse, S., Kutsche, R.D., Leser, U., Weber, H.: Federated information systems: Concepts, terminology and architectures. Technical Report Forschungsberichte des Fachbereichs Informatik 99-9, Technische Universität Berlin (1999)Google Scholar
  4. 4.
    Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: SchemaSQL - a language for interoperability in relational multi-database systems. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) 22th International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India, September 1996, pp. 239–250 (1996)Google Scholar
  5. 5.
    Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22(3), 183–236 (1990)CrossRefGoogle Scholar
  6. 6.
    Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)CrossRefGoogle Scholar
  7. 7.
    Stuckenschmidt, H., Vdovjak, R., Houben, G.-J., Broekstra, J.: Index structures and algorithms for querying distributed rdf repositories. In: WWW 2004 (2004)Google Scholar
  8. 8.
    Chen, H., Wang, Y., Wang, H., Mao, Y., Tang, J., Zhou, C., Yin, A., Wu, Z.: Towards a semantic web of relational databases: a practical semantic toolkit and an in-use case from traditional chinese medicine. In: 4th International Semantic Web Conference (ISWC), Athens, USA. LNCS, pp. 750–763. Springer, Heidelberg (2006)Google Scholar
  9. 9.
    Hartig, O., Heese, R.: The sparql query graph model for query optimization. In: 4th European Semantic Web Conference (ESWC), pp. 564–578 (2007)Google Scholar
  10. 10.
    Bernstein, A., Christoph Kiefer, M.S.: OptARQ: A SPARQL Optimization Approach based on Triple Pattern Selectivity Estimation. Technical Report ifi-2007.03, Department of Informatics, University of Zurich (2007)Google Scholar
  11. 11.
    Harth, A., Decker, S.: Optimized index structures for querying rdf from the web. In: Third Latin American Web Congress (LA-WEB), Washington, DC, USA, p. 71. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  12. 12.
    Harris, S., Gibbins, N.: 3store: Efficient bulk rdf storage. In: PSSS - Practical and Scalable Semantic Systems (2003)Google Scholar
  13. 13.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)CrossRefGoogle Scholar
  14. 14.
    Manola, F., Miller, E.: RDF Primer, W3C Recommendation (2004), http://www.w3.org/TR/rdf-primer/
  15. 15.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: 4th International Semantic Web Conference (ISWC), Athens, GA, USA, pp. 30–43 (November 2006)Google Scholar
  16. 16.
    Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: 23rd Int. Conference on Very Large Data Bases (VLDB), pp. 276–285. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  17. 17.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: International conference on Management of data (SIGMOD), pp. 23–34. ACM Press, New York (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Bastian Quilitz
    • 1
  • Ulf Leser
    • 1
  1. 1.Humboldt-Universität zu Berlin 

Personalised recommendations