Advertisement

Processing Aggregate Queries in a Federation of SPARQL Endpoints

  • Dilshod Ibragimov
  • Katja Hose
  • Torben Bach Pedersen
  • Esteban Zimányi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9088)

Abstract

More and more RDF data is exposed on the Web via SPARQL endpoints. With the recent SPARQL 1.1 standard, these datasets can be queried in novel and more powerful ways, e.g., complex analysis tasks involving grouping and aggregation, and even data from multiple SPARQL endpoints, can now be formulated in a single query. This enables Business Intelligence applications that access data from federated web sources and can combine it with local data. However, as both aggregate and federated queries have become available only recently, state-of-the-art systems lack sophisticated optimization techniques that facilitate efficient execution of such queries over large datasets. To overcome these shortcomings, we propose a set of query processing strategies and the associated Cost-based Optimizer for Distributed Aggregate queries (CoDA) for executing aggregate SPARQL queries over federations of SPARQL endpoints. Our comprehensive experiments show that CoDA significantly improves performance over current state-of-the-art systems.

Notes

Acknowledgment

This research is partially funded by the Erasmus Mundus Joint Doctorate in “Information Technologies for Business Intelligence – Doctoral College (IT4BI-DC)”.

References

  1. 1.
    Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  2. 2.
    Akar, Z., Halaç, T.G., Ekinci, E.E., Dikenelli, O.: Querying the web of interlinked datasets using VoID descriptions. In: LDOW 2012 (2012)Google Scholar
  3. 3.
    Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets. In: LDOW 2009 (2009)Google Scholar
  4. 4.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  5. 5.
    Basca, C., Bernstein, A.: Avalanche: putting the spirit of the web back into semantic web querying. In: SSWS 2010 (2010)Google Scholar
  6. 6.
    Berners-Lee, T.: Linked data. W3C Design Issues (2006). http://www.w3.org/DesignIssues/LinkedData.html
  7. 7.
    Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and optimization of the SPARQL 1.1 federation extension. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 1–15. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  8. 8.
    Buil-Aranda, C., Arenas, M., Corcho, O., Polleres, A.: Federating queries in SPARQL 1.1: syntax, semantics and evaluation. Web Semant. 18(1), 1–17 (2013)CrossRefGoogle Scholar
  9. 9.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  10. 10.
    Buil-Aranda, C., Polleres, A., Umbrich, J.: Strategies for executing federated queries in SPARQL1.1. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 390–405. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  11. 11.
    Transaction Processing Performance Council. TPC Benchmark H - Decision Support. http://www.tpc.org/tpch
  12. 12.
    Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)CrossRefzbMATHGoogle Scholar
  13. 13.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems - The Complete Book, 2nd edn. Pearson Prentice Hall, Upper Saddle River (2009) Google Scholar
  14. 14.
    Görlitz, O., Staab, S.: Federated data management and query optimization for linked open data. In: Vakali, A., Jain, L.C. (eds.) New Directions in Web Data Management 1. SCI, vol. 331, pp. 109–137. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  15. 15.
    Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VoID descriptions. In: COLD 2011 (2011)Google Scholar
  16. 16.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: ICDE 1996, pp. 152–159 (1996)Google Scholar
  17. 17.
    Hagedorn, S., Hose, K., Sattler, K.-U., Umbrich, J.: Resource planning for SPARQL query execution on data sharing platforms. In: COLD 2014 (2014)Google Scholar
  18. 18.
    Ladwig, G., Tran, T.: SIHJoin: querying remote and local linked data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 139–153. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  19. 19.
    O’Neil, P., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). Technical report, UMass/Boston, June 2009Google Scholar
  20. 20.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  21. 21.
    Urhan, T., Franklin, M.J.: XJoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)Google Scholar
  22. 22.
    Wick, M.: GeoNames geographical database. http://www.geonames.org
  23. 23.
    World Wide Web Consortium. Describing Linked Datasets with the VoID Vocabulary (W3C Interest Group Note 03 March 2011). http://www.w3.org/TR/void/
  24. 24.
    World Wide Web Consortium. SPARQL 1.1 Overview (W3C Recommendation 21 March 2013). http://www.w3.org/TR/sparql11-overview/
  25. 25.
    Wylot, M., Pont, J., Wisniewski, M., Cudré-Mauroux, P.: dipLODocus\( _ {\rm {[RDF]}}\)—short and long-tail RDF analytics for massive webs of data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 778–793. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Dilshod Ibragimov
    • 1
    • 2
  • Katja Hose
    • 2
  • Torben Bach Pedersen
    • 2
  • Esteban Zimányi
    • 1
  1. 1.Université Libre de BruxellesBrusselsBelgium
  2. 2.Aalborg UniversityAalborgDenmark

Personalised recommendations