On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries

Vidal, Maria-Esther; Castillo, Simón; Acosta, Maribel; Montoya, Gabriela; Palma, Guillermo

doi:10.1007/978-3-662-49534-6_4

Maria-Esther Vidal¹⁶,
Simón Castillo¹⁶,
Maribel Acosta¹⁷,
Gabriela Montoya¹⁸ &
…
Guillermo Palma¹⁶

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9620))

789 Accesses
17 Citations

Abstract

We consider the problem of source selection and query decomposition in federations of SPARQL endpoints, where query decompositions of a SPARQL query should reduce execution time and maximize answer completeness. This problem is in general intractable, and performance and answer completeness of SPARQL queries can be considerably affected when the number of SPARQL endpoints in a federation increases. We devise a formalization of this problem as the Vertex Coloring Problem and propose an approximate algorithm named Fed-DSATUR. We rely on existing results from graph theory to characterize the family of SPARQL queries for which Fed-DSATUR can produce optimal decompositions in polynomial time on the size of the query, i.e., on the number of SPARQL triple patterns in the query. Fed-DSATUR scales up much better to SPARQL queries with a large number of triple patterns, and may exhibit significant improvements in performance while answer completeness remains close to 100 %. More importantly, we put our results in perspective, and provide evidence of SPARQL queries that are hard to decompose and constitute new challenges for data management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.earthobservatory.eu/.
2.
https://www.openphacts.org/.
3.
http://geoknow.eu/.
4.
http://www.w3.org/TR/rdf-sparql-protocol/.
5.
http://lov.okfn.org/dataset/lov.
6.
http://xmlns.com/foaf/spec/.
7.
http://www.w3.org/TR/void/.
8.
http://www.linkedmdb.org/.
9.
http://dbpedia.org/About.
10.
http://www.geonames.org/.
11.
http://data.nytimes.com/.
12.
http://data.semanticweb.org/.
13.
http://dbtune.org/jamendo/.
14.
http://www.drugbank.ca/.
15.
http://www.genome.jp/kegg/.
16.
http://www.ebi.ac.uk/chebi/.
17.
SPARQL queries with different SPARQL operators, e.g., UNION or OPTIONAL.
18.
http://iwb.fluidops.com:7879/resource/Datasets, November 2011.
19.
pred(t) returns the predicate of the triple pattern t.
20.
cost(D) is monotonic w.r.t. number of subgoals in DP, if and only if, the values of cost(D) monotonically increase with the number of subgoals in DP, i.e., for all \(D=(DP,f,g)\) and \(D'=(DP',f',g')\) such that \(\mid DP \mid < \mid DP' \mid \) one has cost(D) \(<\) \(\textit{cost}(D')\). A sufficient condition for the function cost(.) to be monotonic is that the query comprises only triple pattern that can be evaluated against only one endpoint.
21.
http://iwb.fluidops.com:7879/resource/Datasets, November 2011.
22.
http://silurian.thalassa.cbm.usb.ve/.
23.
As indicated in Theorem 2, these decompositions can be optimal depending on the property of monotonicity of the function cost(.).
24.
http://jena.sourceforge.net/ARQ.
25.
https://github.com/seagent/WoDQA/.

References

Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011)
Chapter Google Scholar
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H. (ed.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)
Chapter Google Scholar
Basca, C., Bernstein, A.: Querying a messy web of data with Avalanche. J. Web Semant. 26, 1–28 (2014)
Article Google Scholar
Brélaz, D.: New methods to color vertices of a graph. Commun. ACM 22(4), 251–256 (1979)
Article MATH Google Scholar
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)
Article MathSciNet MATH Google Scholar
Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 1–15. Springer, Heidelberg (2011)
Chapter Google Scholar
Castillo, S., Palma, G., Vidal, M.: SILURIAN: a SPARQL visualizer for understanding queries and federations. In: Proceedings of the ISWC Posters and Demonstrations Track, pp. 137–140 (2013)
Google Scholar
Florescu, D., Levy, A.Y., Mendelzon, A.O.: Database techniques for the world-wide web: a survey. SIGMOD Record 27(3), 59–74 (1998)
Article Google Scholar
Fundulaki, I., Auer, S.: Linked open data - introduction to the special theme. ERCIM News 96, 2014 (2014)
Google Scholar
Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of the International Workshop on Consuming Linked Data (COLD) (2011)
Google Scholar
Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)
Article MATH Google Scholar
Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 9–16 (2006)
Google Scholar
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 411–420 (2010)
Google Scholar
Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: mediation and integration infrastructure for semantic web data. J. Web Semant. 1(2), 155–175 (2004)
Article Google Scholar
Janczewski, R., Kubale, M., Manuszewski, K., Piwakowski, K.: The smallest hard-to-color graph for algorithm DSATUR. Discrete Math. 236(1–3), 151–165 (2001)
Article MathSciNet MATH Google Scholar
Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL query optimization on top of DHTs. In: Patel-Schneider, P.F. (ed.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 418–435. Springer, Heidelberg (2010)
Chapter Google Scholar
Lampo, T., Vidal, M.-E., Danilow, J., Ruckhaus, E.: To cache or not to cache: the effects of warming cache in complex SPARQL queries. In: Meersman, R., Dillon, T., Herrero, P., Kumar, A., Reichert, M., Qing, L., Ooi, B.-C., Damiani, E., Schmidt, D.C., White, J., Hauswirth, M., Hitzler, P., Mohania, M. (eds.) OTM 2011, Part II. LNCS, vol. 7045, pp. 716–733. Springer, Heidelberg (2011)
Chapter Google Scholar
Li, Y., Heflin, J.: Using reformulation trees to optimize queries over distributed heterogeneous sources. In: Patel-Schneider, P.F. (ed.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 502–517. Springer, Heidelberg (2010)
Chapter Google Scholar
Montoya, G., Vidal, M.-E., Corcho, O., Ruckhaus, E., Buil-Aranda, C.: Benchmarking federated SPARQL query engines: are existing testbeds enough? In: Cudré-Mauroux, P. (ed.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 313–324. Springer, Heidelberg (2012)
Chapter Google Scholar
Montoya, G., Vidal, M.-E., Acosta, M.: A heuristic-based approach for planning federated SPARQL queries. In: Proceedings of the International Workshop on Consuming Linked Data (COLD) (2012)
Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16 (2009)
Article Google Scholar
Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)
Chapter Google Scholar
Saleem, M., Ngonga Ngomo, A.-C.: HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Heidelberg (2014)
Chapter Google Scholar
Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: duplicate-AWare federated query processing over the web of data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013)
Chapter Google Scholar
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P. (ed.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)
Google Scholar
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L. (ed.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011)
Chapter Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L. (ed.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)
Chapter Google Scholar
Segundo, P.S.: A new DSATUR-based algorithm for exact vertex coloring. Comput. Oper. 39(7), 1724–1733 (2012)
Article MathSciNet MATH Google Scholar
Vidal, M.-E., Ruckhaus, E., Lampo, T., Martínez, A., Sierra, J., Polleres, A.: Efficiently joining group patterns in SPARQL queries. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part I. LNCS, vol. 6088, pp. 228–242. Springer, Heidelberg (2010)
Chapter Google Scholar
Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38–49 (1992)
Article Google Scholar
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: Triplebit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)
Google Scholar
Zadorozhny, V., Raschid, L., Vidal, M.-E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for websources. In: Proceedings of the SIGMOD Conference, pp. 85–96 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Simón Bolívar, Caracas, Venezuela
Maria-Esther Vidal, Simón Castillo & Guillermo Palma
Institute AIFB, Karlsruhe Institute of Technology, Karlsruhe, Germany
Maribel Acosta
University of Nantes, Nantes, France
Gabriela Montoya

Authors

Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Simón Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Maribel Acosta
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Montoya
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Palma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria-Esther Vidal .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner

A Additional Complex Queries

We have defined a set of ten additional queries which comprise a large number of triple patterns, basic graph patterns, and different SPARQL operators. Extended setup evaluates the effects of selectivity of BGPs, large number of triple patterns, and number of SPARQL operators. The additionally queries are composed of between 6 and 46 triple patterns and can be decomposed into up to 9 subqueries.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vidal, ME., Castillo, S., Acosta, M., Montoya, G., Palma, G. (2016). On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV. Lecture Notes in Computer Science(), vol 9620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49534-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-49534-6_4
Published: 20 February 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49533-9
Online ISBN: 978-3-662-49534-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Additional Complex Queries

A Additional Complex Queries

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation