S2X: Graph-Parallel Querying of RDF with GraphX
- Cite this paper as:
- Schätzle A., Przyjaciel-Zablocki M., Berberich T., Lausen G. (2016) S2X: Graph-Parallel Querying of RDF with GraphX. In: Wang F., Luo G., Weng C., Khan A., Mitra P., Yu C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) 2015. Lecture Notes in Computer Science, vol 9579. Springer, Cham
RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.