Advertisement

Distributed Semantic Analytics Using the SANSA Stack

  • Jens Lehmann
  • Gezim Sejdiu
  • Lorenz Bühmann
  • Patrick Westphal
  • Claus Stadler
  • Ivan Ermilov
  • Simon Bin
  • Nilesh Chakraborty
  • Muhammad Saleem
  • Axel-Cyrille Ngonga Ngomo
  • Hajira Jabeen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10588)

Abstract

A major research challenge is to perform scalable analysis of large-scale knowledge graphs to facilitate applications like link prediction, knowledge base completion and reasoning. Analytics methods which exploit expressive structures usually do not scale well to very large knowledge bases, and most analytics approaches which do scale horizontally (i.e., can be executed in a distributed environment) work on simple feature-vector-based input. This software framework paper describes the ongoing Semantic Analytics Stack (SANSA) project, which supports expressive and scalable semantic analytics by providing functionality for distributed computing on RDF data.

Notes

Acknowledgements

This work was partly supported by the grant from the European Union’s Horizon 2020 research Europe flag and innovation programme for the project Big Data Europe (GA no. 644564) and a research grant from the German Ministry BMWI under the SAKE project (Grant No. 01MD15006E).

References

  1. 1.
    Andersen, J.S., Zukunft, O.: Evaluating the scaling of graph-algorithms for big data using GraphX. In: International Conference on Open and Big Data (OBD), pp. 1–8. IEEE (2016)Google Scholar
  2. 2.
    Auer, S., et al.: The BigDataEurope platform – supporting the variety dimension of big data. In: Cabot, J., De Virgilio, R., Torlone, R. (eds.) ICWE 2017. LNCS, vol. 10360, pp. 41–59. Springer, Cham (2017). doi: 10.1007/978-3-319-60131-1_3 CrossRefGoogle Scholar
  3. 3.
    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)Google Scholar
  4. 4.
    Bühmann, L., Lehmann, J., Westphal, P.: DL-Learner-a framework for inductive learning on the semantic web. Web Semant.: Sci. Serv. Agents World Wide Web 39, 15–24 (2016)CrossRefGoogle Scholar
  5. 5.
    Chichester, C., Digles, D., Siebes, R., Loizou, A., Groth, P., Harland, L.: Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discov. Today 20(4), 399–405 (2015)CrossRefGoogle Scholar
  6. 6.
    Cohen, W.W.: TensorLog: a differentiable deductive database. arXiv preprint arXiv:1605.06523 (2016)
  7. 7.
    Ermilov, I., Lehmann, J., Martin, M., Auer, S.: LODStats: the data web census dataset. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016 Part II. LNCS, vol. 9982, pp. 38–46. Springer, Cham (2016). doi: 10.1007/978-3-319-46547-0_5 CrossRefGoogle Scholar
  8. 8.
    Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. Very Large Databases J. 24, 707–730 (2015)CrossRefGoogle Scholar
  9. 9.
    Graux, D., Jachiet, L., Genevès, P., Layaïda, N.: SPARQLGX: efficient distributed evaluation of SPARQL with apache spark. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016 Part II. LNCS, vol. 9982, pp. 80–87. Springer, Cham (2016). doi: 10.1007/978-3-319-46547-0_9 CrossRefGoogle Scholar
  10. 10.
    Gu, R., Wang, S., Wang, F., Yuan, C., Huang, Y.: Cichlid: efficient large scale RDFS/OWL reasoning with spark. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 700–709. IEEE (2015)Google Scholar
  11. 11.
    Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 289–300. ACM, New York (2014)Google Scholar
  12. 12.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  13. 13.
    Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: RDFox: a highly-scalable RDF store. In: Arenas, M., et al. (eds.) ISWC 2015 Part II. LNCS, vol. 9367, pp. 3–20. Springer, Cham (2015). doi: 10.1007/978-3-319-25010-6_1 CrossRefGoogle Scholar
  14. 14.
    Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2016)CrossRefGoogle Scholar
  15. 15.
    Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st International Conference on World Wide Web, pp. 397–400. ACM (2012)Google Scholar
  16. 16.
    Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)CrossRefGoogle Scholar
  17. 17.
    Schuetzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. PVLDB 9(10), 804–815 (2016)Google Scholar
  18. 18.
    Troumpoukis, A. Charalambidis, A., Mouchakis, G., Konstantopoulos, S., Siebes, R., de Boer, V., Soiland-Reyes, R., Digles, D.: Developing a benchmark suite for semantic web data from existing workflows. In: BLINK@ISWC (2016)Google Scholar
  19. 19.
    Urbani, J., Kotoulas, S., Maassen, J., van Harmelen, F., Bal, H.: OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010 Part I. LNCS, vol. 6088, pp. 213–227. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-13486-9_15 CrossRefGoogle Scholar
  20. 20.
    Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable distributed reasoning using mapreduce. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 634–649. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04930-9_40 CrossRefGoogle Scholar
  21. 21.
    Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.: QueryPIE: backward reasoning for OWL horst over very large knowledge bases. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011 Part I. LNCS, vol. 7031, pp. 730–745. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-25073-6_46 CrossRefGoogle Scholar
  22. 22.
    Wang, W.Y., Mazaitis, K., Cohen, W.W.: Structure learning via parameter learning. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1199–1208. ACM (2014)Google Scholar
  23. 23.
    Xu, Z., Chen, W., Gai, L., Wang, T.: SparkRDF: in-memory distributed RDF management framework for large-scale social data. In: Dong, X.L., Yu, X., Li, J., Sun, Y. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 337–349. Springer, Cham (2015). doi: 10.1007/978-3-319-21042-1_27 CrossRefGoogle Scholar
  24. 24.
    Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jens Lehmann
    • 1
    • 2
  • Gezim Sejdiu
    • 1
  • Lorenz Bühmann
    • 3
  • Patrick Westphal
    • 3
  • Claus Stadler
    • 3
  • Ivan Ermilov
    • 3
  • Simon Bin
    • 3
  • Nilesh Chakraborty
    • 1
  • Muhammad Saleem
    • 3
  • Axel-Cyrille Ngonga Ngomo
    • 3
    • 4
  • Hajira Jabeen
    • 1
  1. 1.University of BonnBonnGermany
  2. 2.Fraunhofer IAISBonnGermany
  3. 3.Institute for Applied Informatics (InfAI)University of LeipzigLeipzigGermany
  4. 4.Data Science GroupPaderborn UniversityPaderbornGermany

Personalised recommendations