Parallelizing Federated SPARQL Queries in Presence of Replicated Data

  • Thomas Minier
  • Gabriela Montoya
  • Hala Skaf-Molli
  • Pascal Molli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10577)

Abstract

Federated query engines have been enhanced to exploit new data localities created by replicated data, e.g., Fedra. However, existing replication aware federated query engines mainly focus on pruning sources during the source selection and query decomposition in order to reduce intermediate results thanks to data locality. In this paper, we implement a replication-aware parallel join operator: Pen. This operator can be used to exploit replicated data during query execution. For existing replication-aware federated query engines, this operator exploits replicated data to parallelize the execution of joins and reduce execution time. For Triple Pattern Fragment (TPF) clients, this operator exploits the availability of several TPF servers exposing the same dataset to share the load among the servers. We implemented Pen  in the federated query engine FedX with the replicated-aware source selection Fedra and in the reference TPF client. We empirically evaluated the performance of engines extended with the Pen operator and the experimental results suggest that our extensions outperform the existing approaches in terms of execution time and balance of load among the servers, respectively.

Keywords

Linked Data Parallel query processing Fragment replication Federated SPARQL Queries Processing Triple Pattern Fragment Load balancing 

Notes

Acknowledgments

This work is partially supported through the FaBuLA project, part of the AtlanSTIC 2020 program.

References

  1. 1.
    Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_2 CrossRefGoogle Scholar
  2. 2.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_13 Google Scholar
  3. 3.
    Aluç, G., Ozsu, M., Daudjee, K., Hartig, O.: chameleon-db: a workload-aware robust RDF data management system. University of waterloo. Technical report, CS-2013-10 (2013)Google Scholar
  4. 4.
    Bitton, D., Boral, H., DeWitt, D.J., Wilkinson, W.K.: Parallel algorithms for the execution of relational database operations. ACM Trans. Database Syst. (TODS) 8(3), 324–353 (1983)CrossRefGoogle Scholar
  5. 5.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41338-4_18 CrossRefGoogle Scholar
  6. 6.
    DeWitt, D.J., Naughton, J.F., Burger, J.: Proceedings of the Second International Conference on Nested loops revisited. In: Parallel and Distributed Information Systems, 1993, pp. 230–242. IEEE (1993)Google Scholar
  7. 7.
    Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013)CrossRefGoogle Scholar
  8. 8.
    Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting void descriptions. In: Proceedings of the Second International Conference on Consuming Linked Data, COLD 2011, vol. 782, pp. 13–24, Aachen, Germany (2010). CEUR-WS.org, http://dl.acm.org/citation.cfm?id=2887352.2887354
  9. 9.
    Görlitz, O., Staab, S.: Federated data management and query optimization for linked open data. In: Vakali, A., Jain, L.C. (eds.) New Directions in Web Data Management 1. SCI, vol. 331, pp. 109–137. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-17551-0_5 CrossRefGoogle Scholar
  10. 10.
    Hose, K., Schenkel, R.: Towards benefit-based RDF source selection for SPARQL queries. In: Proceedings of the 4th International Workshop on Semantic Web Information Management, p. 2. ACM (2012)Google Scholar
  11. 11.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. (CSUR) 32(4), 422–469 (2000)CrossRefGoogle Scholar
  12. 12.
    Minier, T., Montoya, G., Skaf-Molli, H., Molli, P.: PeNeLoop: Parallelizing federated SPARQL queries in presence of replicated fragments. In: Joint Proceedings of the 2nd RDF Stream Processing (RSP 2017) and the Querying the Web of Data (QuWeDa 2017) workshops, CEUR Workshop Proceedings, pp. 37–50 (2017)Google Scholar
  13. 13.
    Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.-E.: Federated SPARQL queries processing with replicated fragments. In: Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 36–51. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_3 CrossRefGoogle Scholar
  14. 14.
    Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M.E.: Decomposing federated queries in presence of replicated fragments. Web Semant. Sci. Serv. Agents World Wide Web 42, 1–18 (2017)CrossRefGoogle Scholar
  15. 15.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer Science & Business Media, New York (2011).  https://doi.org/10.1007/978-1-4419-8834-8 Google Scholar
  16. 16.
    Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: Duplicate-AWare federated query processing over the web of data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_36 CrossRefGoogle Scholar
  17. 17.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25073-6_38 CrossRefGoogle Scholar
  18. 18.
    Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester, B., Haesendonck, G., Colpaert, P.: Triple pattern fragments: a low-cost knowledge graph interface for the web. Web Semant. Sci. Serv. Agents World Wide Web 37, 184–206 (2016)CrossRefGoogle Scholar
  19. 19.
    Wilcoxon, F., Kotz, S.: Individual comparisons by ranking methods. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics. Statistics (Perspectives in Statistics), pp. 196–202. Springer, New York (1992).  https://doi.org/10.1007/978-1-4612-4380-9_16 CrossRefGoogle Scholar
  20. 20.
    Wilschut, A.N., Apers, P.M.: Dataflow query execution in a parallel main-memory environment. Distrib. Parallel Databases 1(1), 103–128 (1993)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Thomas Minier
    • 1
  • Gabriela Montoya
    • 2
  • Hala Skaf-Molli
    • 1
  • Pascal Molli
    • 1
  1. 1.LS2NNantes UniversityNantesFrance
  2. 2.Department of Computer ScienceAalborg UniversityAalborgDenmark

Personalised recommendations