Advertisement

Querying Large Knowledge Graphs over Triple Pattern Fragments: An Empirical Study

  • Lars HelingEmail author
  • Maribel Acosta
  • Maria Maleshkova
  • York Sure-Vetter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11137)

Abstract

Triple Pattern Fragments (TPFs) are a novel interface for accessing data in knowledge graphs on the web. So far, work on performance evaluation and optimization has focused mainly on SPARQL query execution over TPF servers. However, in order to devise querying techniques that efficiently access large knowledge graphs via TPFs, we need to identify and understand the variables that influence the performance of TPF servers on a fine-grained level. In this work, we assess the performance of TPFs by measuring the response time for different requests and analyze how the requests’ properties, as well as the TPF server configuration, may impact the performance. For this purpose, we developed the Triple Pattern Fragment Profiler to determine the performance of TPF server. The resource is openly available at https://doi.org/10.5281/zenodo.1211621. To this end, we conduct an empirical study over four large knowledge graphs in different server environments and configurations. As part of our analysis, we provide an extensive evaluation of the results and focus on the impact of the variables: triple pattern type, answer cardinality, page size, backend and the environment type on the response time. The results suggest that all variables impact on the measured response time and allow for deriving suggestions for TPF server configurations and query optimization.

Notes

Acknowledgments

The authors thank Ruben Verborgh for providing feedback and the KG dumps and Javier Fernández for the fruitful discussions about HDT. This work was carried out with the support of the German Research Foundation (DFG) within the project “Sozial-Raumwissenschaftliche Forschungsdateninfrastruktur (SoRa)”.

References

  1. 1.
    Acosta, M., Vidal, M.-E.: Networks of linked data eddies: an adaptive web query processing engine for RDF data. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 111–127. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_7CrossRefGoogle Scholar
  2. 2.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41338-4_18CrossRefGoogle Scholar
  3. 3.
    Fernández, J.D., Martínez-Prieto, M.A., Gutierrez, C.: Compact representation of large RDF data sets for publishing and exchange. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 193–208. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-17746-0_13CrossRefGoogle Scholar
  4. 4.
    Folz, P., Skaf-Molli, H., Molli, P.: CyCLaDEs: a decentralized cache for triple pattern fragments. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 455–469. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-34129-3_28CrossRefGoogle Scholar
  5. 5.
    Hartig, O., Buil-Aranda, C.: Bindings-restricted triple pattern fragments. In: Debruyne, C. (ed.) OTM 2016. LNCS, vol. 10033, pp. 762–779. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48472-3_48CrossRefGoogle Scholar
  6. 6.
    Kjernsmo, K., Tyssedal, J.S.: Introducing statistical design of experiments to SPARQL endpoint evaluation. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 360–375. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41338-4_23CrossRefGoogle Scholar
  7. 7.
    Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)CrossRefGoogle Scholar
  8. 8.
    Montoya, G., Vidal, M.-E., Corcho, O., Ruckhaus, E., Buil-Aranda, C.: Benchmarking federated SPARQL query engines: are existing testbeds enough? In: Cudré-Mauroux, P. (ed.) ISWC 2012. LNCS, vol. 7650, pp. 313–324. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35173-0_21CrossRefGoogle Scholar
  9. 9.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006).  https://doi.org/10.1007/11926078_3CrossRefGoogle Scholar
  10. 10.
    Rakhmawati, N.A., Karnstedt, M., Hausenblas, M., Decker, S.: On metrics for measuring fragmentation of federation over SPARQL endpoints. In: WEBIST, pp. 119–126 (2014)Google Scholar
  11. 11.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_16CrossRefGoogle Scholar
  12. 12.
    Verborgh, R.: Linkeddatafragments/server.js: v2.2.2, May 2017.  https://doi.org/10.5281/zenodo.570148
  13. 13.
    Verborgh, R., et al.: Querying datasets on the web with high availability. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 180–196. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_12CrossRefGoogle Scholar
  14. 14.
    Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. Web Semant. Sci. Serv. Agents World Wide Web 37, 184–206 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lars Heling
    • 1
    Email author
  • Maribel Acosta
    • 1
  • Maria Maleshkova
    • 1
  • York Sure-Vetter
    • 1
  1. 1.Institute AIFBKarlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations