S2X: Graph-Parallel Querying of RDF with GraphX

  • Alexander Schätzle
  • Martin Przyjaciel-Zablocki
  • Thorsten Berberich
  • Georg Lausen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9579)

Abstract

RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.

Keywords

RDF SPARQL Hadoop Spark GraphX 

References

  1. 1.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014)Google Scholar
  2. 2.
    Fard, A., Nisar, M., Ramaswamy, L., Miller, J., Saltz, M.: A distributed vertex-centric approach for pattern matching in massive graphs. In: IEEE Big Data, pp. 403–411 (2013)Google Scholar
  3. 3.
    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX OSDI 2014, pp. 599–613 (2014)Google Scholar
  4. 4.
    Goodman, E.L., Grunwald, D.: Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: IA3 (2014)Google Scholar
  5. 5.
    Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)Google Scholar
  6. 6.
    Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)Google Scholar
  7. 7.
    Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE TKDE 23(9), 1312–1327 (2011)Google Scholar
  8. 8.
    Manola, F., Miller, E., McBride, B.: RDF Primer (2004). http://www.w3.org/TR/rdf-primer/
  9. 9.
    Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: IEEE Big Data, pp. 255–263 (2013)Google Scholar
  10. 10.
    Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008). http://www.w3.org/TR/rdf-sparql-query/
  11. 11.
    Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: A SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 Posters & Demonstrations Track, pp. 241–244 (2013)Google Scholar
  12. 12.
    Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)Google Scholar
  13. 13.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Fast and interactive analytics over hadoop data with spark. USENIX; Login 34(4), 45–51 (2012)Google Scholar
  14. 14.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI, pp. 15–28 (2012)Google Scholar
  15. 15.
    Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB 2013, pp. 265–276 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexander Schätzle
    • 1
  • Martin Przyjaciel-Zablocki
    • 1
  • Thorsten Berberich
    • 1
  • Georg Lausen
    • 1
  1. 1.Department of Computer ScienceUniversity of FreiburgFreiburgGermany

Personalised recommendations