Skip to main content

In-Database Graph Analytics with Recursive SPARQL

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Abstract

Works on knowledge graphs and graph-based data management often focus either on graph query languages or on frameworks for graph analytics, where there has been little work in trying to combine both approaches. However, many real-world tasks conceptually involve combinations of these approaches: a graph query can be used to select the appropriate data, which is then enriched with analytics, and then possibly filtered or combined again with other data by means of a query language. In this paper we propose a language that is well-suited for both graph querying and analytical tasks. We propose a minimalistic extension of SPARQL to allow for expressing analytical tasks over existing SPARQL infrastructure; in particular, we propose to extend SPARQL with recursive features, and provide a formal syntax and semantics for our language. We show that this language can express key analytical tasks on graphs (in fact, it is Turing complete). Moreover, queries in this language can also be compiled into sequences of iterations of SPARQL update statements. We show how procedures in our language can be implemented over off-the-shelf SPARQL engines, with a specialised client that can leverage database operations to improve the performance of queries. Results for our implementation show that procedures for popular analytics currently run in seconds or minutes for selective sub-graphs (our target use-case).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.wikidata.org/, or see endpoint at https://query.wikidata.org/.

  2. 2.

    Though more complex forms of “navigational patterns” have been proposed in the literature, they are mostly limited to path-finding and reachability [30].

  3. 3.

    A syntactic way of doing this is to use a command in SPARQL.

  4. 4.

    This corresponds to boolean evaluation. This is without loss of generality because the problem where one considers a tuple of values as an input can be simulated by means of filters.

  5. 5.

    Here we are not interested in languages with decidable containment, in part because we are not addressing how to do reasoning within SPARQAL, but this is a fertile area for future work.

  6. 6.

    For reference, the top such author is George Dick, with a p-index of 0.124.

  7. 7.

    All sources and datasets are available at https://adriansoto.cl/files/SPARQAL.zip.

  8. 8.

    https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.

References

  1. Hogan, A., et al.: Knowledge Graphs. CoRR abs/2003.02320 (2020)

    Google Scholar 

  2. Bonatti, P.A., Decker, S., Polleres, A., Presutti, V.: Knowledge graphs: new directions for knowledge representation on the semantic web. Dagstuhl Rep. 8(9), 29–111 (2018)

    Google Scholar 

  3. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: International Workshop on Graph Data Management Experiences and Systems (GRADES). ACM Press (2013)

    Google Scholar 

  4. Rodriguez, M.A.: The Gremlin graph traversal machine and language. In: Symposium on Database Programming Languages (DBPL), pp. 1–10. ACM (2015)

    Google Scholar 

  5. Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 Query Language. W3C Recommendation (2013). https://www.w3.org/TR/sparql11-query/

  6. Francis, N., et al.: Cypher: an evolving query language for property graphs. In: SIGMOD, pp. 1433–1445. ACM (2018)

    Google Scholar 

  7. Song, X., Chen, S., Zhang, X., Feng, Z.: A CONSTRUCT-based query for weighted RDF graph analytics. In: ISWC Satellites, pp. 25–28 (2019)

    Google Scholar 

  8. Senanayake, U., Piraveenan, M., Zomaya, A.: The pagerank-index: going beyond citation counts in quantifying scientific impact of researchers. PLOS ONE 10(8), 1–34 (2015)

    Article  Google Scholar 

  9. Reutter, J.L., Soto, A., Vrgoč, D.: Recursion in SPARQL. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 19–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_2

    Chapter  Google Scholar 

  10. Corby, O., Faron-Zucker, C., Gandon, F.: LDScript: a linked data script language. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 208–224. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_13

    Chapter  Google Scholar 

  11. Urzua, V., Gutiérrez, C.: Linear recursion in G-CORE. In: Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), vol. 2369. CEUR-WS.org (2019)

    Google Scholar 

  12. Seo, J., Guo, S., Lam, M.S.: SociaLite: datalog extensions for efficient social network analysis. In: International Conference on Data Engineering (ICDE), pp. 278–289

    Google Scholar 

  13. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  14. DeLorimier, M., et al.: GraphStep: a system architecture for sparse-graph algorithms. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 143–151. IEEE Computer Society (2006)

    Google Scholar 

  15. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM Press (2010)

    Google Scholar 

  16. Krepska, E., Kielmann, T., Fokkink, W., Bal, H.E.: HipG: parallel processing of large-scale graphs. Oper. Syst. Rev. 45(2), 3–13 (2011)

    Article  Google Scholar 

  17. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, 8–10 October 2012, pp. 17–30. USENIX Association (2012)

    Google Scholar 

  18. Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., Muthukrishnan, S.: One trillion edges: graph processing at facebook-scale. PVLDB 8(12), 1804–1815 (2015)

    Google Scholar 

  19. Stutz, P., Strebel, D., Bernstein, A.: Signal/Collect12. Semant. Web J. 7(2), 139–166 (2016)

    Article  Google Scholar 

  20. Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new framework for parallel machine learning. CoRR abs/1408.2041 (2014)

    Google Scholar 

  21. Kaminski, M., Grau, B.C., Kostylev, E.V., Motik, B., Horrocks, I.: Stratified negation in limit datalog programs. In: International Joint Conference on Artificial Intelligence, pp. 1875–1881. ijcai.org (2018)

    Google Scholar 

  22. Bellomarini, L., Gottlob, G., Pieris, A., Sallinger, E.: Vadalog: a language and system for knowledge graphs. In: Benzmüller, C., Ricca, F., Parent, X., Roman, D. (eds.) RuleML+RR 2018. LNCS, vol. 11092, pp. 3–8. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99906-7_1

    Chapter  Google Scholar 

  23. Eisner, J., Filardo, N.W.: Dyna: extending datalog for modern AI. In: de Moor, O., Gottlob, G., Furche, T., Sellers, A. (eds.) Datalog 2.0 2010. LNCS, vol. 6702, pp. 181–220. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24206-9_11

    Chapter  Google Scholar 

  24. Carral, D., Dragoste, I., González, L., Jacobs, C., Krötzsch, M., Urbani, J.: VLog: a rule engine for knowledge graphs. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 19–35. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_2

    Chapter  Google Scholar 

  25. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)

    Google Scholar 

  26. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

    Google Scholar 

  27. Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: SPARQL with property paths. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 3–18. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_1

    Chapter  Google Scholar 

  28. Miller, J.J.: Graph database applications and concepts with Neo4j. In: Southern Association for Information Systems Conference (SAIS). AIS eLibrary (2013)

    Google Scholar 

  29. Angles, R., et al.: G-CORE: a core for future graph query languages. In: SIGMOD, pp. 1421–1432 (2018)

    Google Scholar 

  30. Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J.L., Vrgoc, D.: Foundations of modern query languages for graph databases. ACM C. Surv. 50(5), 68:1–68:40 (2017)

    Google Scholar 

  31. Hogan, A.: Canonical forms for isomorphic and equivalent RDF graphs: algorithms for leaning and labelling blank nodes. ACM TWEB 11(4), 1–62 (2017)

    Article  MathSciNet  Google Scholar 

  32. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  33. Hogan, A., Reutter, J.L., Soto, A.: In-database graph anaytics with recursive SPARQL [Extended Version]. https://adriansoto.cl/pdf/SPARQAL-Extended.pdf

  34. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases, vol. 8. Addison-Wesley Reading, Boston (1995)

    MATH  Google Scholar 

  35. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. (TODS) 34(3), 16 (2009)

    Article  Google Scholar 

  36. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of wikidata: semantic technology usage in wikipedia’s knowledge graph. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 376–394. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_23

    Chapter  Google Scholar 

  37. LDBC: Graphalytics Benchmark Suite (2019). https://graphalytics.org/

  38. Raasveldt, M., Mühleisen, H.: Data management for data science-towards embedded analytics. In: CIDR (2020)

    Google Scholar 

  39. Holanda, P., Raasveldt, M., Manegold, S., Mühleisen, H.: Progressive indexes: indexing for interactive data analysis. Proc. VLDB Endowment 12(13), 2366–2378 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Millennium Institute for Foundational Research on Data (IMFD) and by Fondecyt Grant No. 1181896.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrián Soto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hogan, A., Reutter, J.L., Soto, A. (2020). In-Database Graph Analytics with Recursive SPARQL. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12506. Springer, Cham. https://doi.org/10.1007/978-3-030-62419-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62419-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62418-7

  • Online ISBN: 978-3-030-62419-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics