Abstract
RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014)
Fard, A., Nisar, M., Ramaswamy, L., Miller, J., Saltz, M.: A distributed vertex-centric approach for pattern matching in massive graphs. In: IEEE Big Data, pp. 403–411 (2013)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX OSDI 2014, pp. 599–613 (2014)
Goodman, E.L., Grunwald, D.: Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: IA3 (2014)
Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE TKDE 23(9), 1312–1327 (2011)
Manola, F., Miller, E., McBride, B.: RDF Primer (2004). http://www.w3.org/TR/rdf-primer/
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: IEEE Big Data, pp. 255–263 (2013)
Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008). http://www.w3.org/TR/rdf-sparql-query/
Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: A SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 Posters & Demonstrations Track, pp. 241–244 (2013)
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Fast and interactive analytics over hadoop data with spark. USENIX; Login 34(4), 45–51 (2012)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI, pp. 15–28 (2012)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB 2013, pp. 265–276 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G. (2016). S2X: Graph-Parallel Querying of RDF with GraphX. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-41576-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41575-8
Online ISBN: 978-3-319-41576-5
eBook Packages: Computer ScienceComputer Science (R0)