S2X: Graph-Parallel Querying of RDF with GraphX

  • Alexander Schätzle
  • Martin Przyjaciel-Zablocki
  • Thorsten Berberich
  • Georg Lausen
Conference paper

DOI: 10.1007/978-3-319-41576-5_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9579)
Cite this paper as:
Schätzle A., Przyjaciel-Zablocki M., Berberich T., Lausen G. (2016) S2X: Graph-Parallel Querying of RDF with GraphX. In: Wang F., Luo G., Weng C., Khan A., Mitra P., Yu C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) 2015, DMAH 2015. Lecture Notes in Computer Science, vol 9579. Springer, Cham

Abstract

RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.

Keywords

RDF SPARQL Hadoop Spark GraphX 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexander Schätzle
    • 1
  • Martin Przyjaciel-Zablocki
    • 1
  • Thorsten Berberich
    • 1
  • Georg Lausen
    • 1
  1. 1.Department of Computer ScienceUniversity of FreiburgFreiburgGermany

Personalised recommendations