Skip to main content

On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9620))

Abstract

We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed-DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100 %. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.earthobservatory.eu/.

  2. 2.

    https://www.openphacts.org/.

  3. 3.

    http://geoknow.eu/.

  4. 4.

    http://www.w3.org/TR/rdf-sparql-protocol/.

  5. 5.

    http://lov.okfn.org/dataset/lov.

  6. 6.

    http://xmlns.com/foaf/spec/.

  7. 7.

    http://www.w3.org/TR/void/.

  8. 8.

    http://www.linkedmdb.org/.

  9. 9.

    http://dbpedia.org/About.

  10. 10.

    http://www.geonames.org/.

  11. 11.

    http://data.nytimes.com/.

  12. 12.

    http://data.semanticweb.org/.

  13. 13.

    http://dbtune.org/jamendo/.

  14. 14.

    http://www.drugbank.ca/.

  15. 15.

    http://www.genome.jp/kegg/.

  16. 16.

    http://www.ebi.ac.uk/chebi/.

  17. 17.

    SPARQL queries with different SPARQL operators, e.g., UNION or OPTIONAL.

  18. 18.

    http://iwb.fluidops.com:7879/resource/Datasets, November 2011.

  19. 19.

    pred(t) returns the predicate of the triple pattern t.

  20. 20.

    cost(D) is monotonic w.r.t. number of subgoals in DP, if and only if, the values of cost(D) monotonically increase with the number of subgoals in DP, i.e., for all \(D=(DP,f,g)\) and \(D'=(DP',f',g')\) such that \(\mid DP \mid < \mid DP' \mid \) one has cost(D) \(<\) \(\textit{cost}(D')\). A sufficient condition for the function cost(.) to be monotonic is that the query comprises only triple pattern that can be evaluated against only one endpoint.

  21. 21.

    http://iwb.fluidops.com:7879/resource/Datasets, November 2011.

  22. 22.

    http://silurian.thalassa.cbm.usb.ve/.

  23. 23.

    As indicated in Theorem 2, these decompositions can be optimal depending on the property of monotonicity of the function cost(.).

  24. 24.

    http://jena.sourceforge.net/ARQ.

  25. 25.

    https://github.com/seagent/WoDQA/.

References

  1. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H. (ed.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Basca, C., Bernstein, A.: Querying a messy web of data with Avalanche. J. Web Semant. 26, 1–28 (2014)

    Article  Google Scholar 

  4. Brélaz, D.: New methods to color vertices of a graph. Commun. ACM 22(4), 251–256 (1979)

    Article  MATH  Google Scholar 

  5. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 1–15. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Castillo, S., Palma, G., Vidal, M.: SILURIAN: a SPARQL visualizer for understanding queries and federations. In: Proceedings of the ISWC Posters and Demonstrations Track, pp. 137–140 (2013)

    Google Scholar 

  8. Florescu, D., Levy, A.Y., Mendelzon, A.O.: Database techniques for the world-wide web: a survey. SIGMOD Record 27(3), 59–74 (1998)

    Article  Google Scholar 

  9. Fundulaki, I., Auer, S.: Linked open data - introduction to the special theme. ERCIM News 96, 2014 (2014)

    Google Scholar 

  10. Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of the International Workshop on Consuming Linked Data (COLD) (2011)

    Google Scholar 

  11. Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)

    Article  MATH  Google Scholar 

  12. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 9–16 (2006)

    Google Scholar 

  13. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 411–420 (2010)

    Google Scholar 

  14. Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: mediation and integration infrastructure for semantic web data. J. Web Semant. 1(2), 155–175 (2004)

    Article  Google Scholar 

  15. Janczewski, R., Kubale, M., Manuszewski, K., Piwakowski, K.: The smallest hard-to-color graph for algorithm DSATUR. Discrete Math. 236(1–3), 151–165 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL query optimization on top of DHTs. In: Patel-Schneider, P.F. (ed.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 418–435. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Lampo, T., Vidal, M.-E., Danilow, J., Ruckhaus, E.: To cache or not to cache: the effects of warming cache in complex SPARQL queries. In: Meersman, R., Dillon, T., Herrero, P., Kumar, A., Reichert, M., Qing, L., Ooi, B.-C., Damiani, E., Schmidt, D.C., White, J., Hauswirth, M., Hitzler, P., Mohania, M. (eds.) OTM 2011, Part II. LNCS, vol. 7045, pp. 716–733. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Li, Y., Heflin, J.: Using reformulation trees to optimize queries over distributed heterogeneous sources. In: Patel-Schneider, P.F. (ed.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 502–517. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Montoya, G., Vidal, M.-E., Corcho, O., Ruckhaus, E., Buil-Aranda, C.: Benchmarking federated SPARQL query engines: are existing testbeds enough? In: Cudré-Mauroux, P. (ed.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 313–324. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Montoya, G., Vidal, M.-E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the International Workshop on Consuming Linked Data (COLD) (2012)

    Google Scholar 

  21. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16 (2009)

    Article  Google Scholar 

  22. Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Saleem, M., Ngonga Ngomo, A.-C.: HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  24. Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: duplicate-AWare federated query processing over the web of data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P. (ed.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)

    Google Scholar 

  26. Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L. (ed.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  27. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L. (ed.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  28. Segundo, P.S.: A new DSATUR-based algorithm for exact vertex coloring. Comput. Oper. 39(7), 1724–1733 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  29. Vidal, M.-E., Ruckhaus, E., Lampo, T., Martínez, A., Sierra, J., Polleres, A.: Efficiently joining group patterns in SPARQL queries. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part I. LNCS, vol. 6088, pp. 228–242. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  30. Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38–49 (1992)

    Article  Google Scholar 

  31. Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: Triplebit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)

    Google Scholar 

  32. Zadorozhny, V., Raschid, L., Vidal, M.-E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for websources. In: Proceedings of the SIGMOD Conference, pp. 85–96 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria-Esther Vidal .

Editor information

Editors and Affiliations

A Additional Complex Queries

A Additional Complex Queries

We have defined a set of ten additional queries which comprise a large number of triple patterns, basic graph patterns, and different SPARQL operators. Extended setup evaluates the effects of selectivity of BGPs, large number of triple patterns, and number of SPARQL operators. The additionally queries are composed of between 6 and 46 triple patterns and can be decomposed into up to 9 subqueries.

figure d
figure e
figure f
figure g
figure h
figure i
figure j
figure k

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Vidal, ME., Castillo, S., Acosta, M., Montoya, G., Palma, G. (2016). On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV. Lecture Notes in Computer Science(), vol 9620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49534-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49534-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49533-9

  • Online ISBN: 978-3-662-49534-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics