SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data

Xu, Zhichao; Chen, Wei; Gai, Lei; Wang, Tengjiao

doi:10.1007/978-3-319-21042-1_27

SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data

Zhichao Xu^17,18,
Wei Chen^17,18,
Lei Gai^17,18 &
…
Tengjiao Wang^17,18

Conference paper
First Online: 01 January 2015

2759 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9098))

Abstract

Considering the scalability and semantic requirements, Resource Description Framework (RDF) and the de-facto query language SPARQL are well suited for managing and querying online social network (OSN) data. Despite some existing works have introduced distributed framework for querying large-scale data, how to improve online query performance is still a challenging task. To address this problem, this paper proposes a scalable RDF data framework, which uses key-value store for offline RDF storage and pipelined in-memory based query strategy. The proposed framework efficiently supports SPARQL Basic Graph Pattern (BGP) queries on large-scale datasets. Experiments on the benchmark dataset demonstrate that the online SPARQL query performance of our framework outperforms existing distributed RDF solutions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Semantic Web. http://www.w3.org/standards/semanticweb/
FOAF-project. http://www.foaf-project.org/
SIOC project. http://rdfs.org/sioc/spec/
SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/
Neumann, T., Weikum, G.: RDF-3X: A RISC-Style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)
Article Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. In: PVLDB, pp. 1008–1019 (2008)
Google Scholar
Sesame. http://www.openrdf.org
Husain, M., McGlothlin, J., Masud, M., Khan, L., Thuraisingham, B.: Heuristics-Based Querying Processing for Large RDF Graphs Using Cloud Computing. IEEE Transactions on Knowledge and Data Engineering 23, 1312–1327 (2011)
Article Google Scholar
Myung, J., Yeon, J., Lee, S.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of MDAC, pp. 6:1–6:6 (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on OSDI, vol. 6, p. 10 (2004)
Google Scholar
Kellerman, J.: HBase: Structured storage of sparse data for hadoop (2009). http://hbase.apache.org/
Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)
Google Scholar
Hadoop. http://hadoop.apache.org/
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)
Google Scholar
Jena. https://jena.apache.org/
Atre, M., Srinivasan, J., Hendler, J.: BitMat: a main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In: ISWC (2008)
Google Scholar
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: Semantic Web Information Management, pp. 501–519 (2009)
Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2RDF: adaptive query processing on RDF data in the cloud. In: Proc. of WWW, pp. 397–400 (2012)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB, pp. 265–276. VLDB Endowment (2013)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on OSDI, pp. 305–314 (2006)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Semantics 3, 158–182 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing, China
Zhichao Xu, Wei Chen, Lei Gai & Tengjiao Wang
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Zhichao Xu, Wei Chen, Lei Gai & Tengjiao Wang

Authors

Zhichao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gai
View author publications
You can also search for this author in PubMed Google Scholar
Tengjiao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Chen .

Editor information

Editors and Affiliations

Google, CA, USA
Xin Luna Dong
Postdoc Apartments (Hong Lou) 4-1-4, Shandong University, Li Cheng, Jinan, China
Xiaohui Yu
Tsinghua University, Beijing, China
Jian Li
Northeastern University, BOSTON, USA
Yizhou Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Z., Chen, W., Gai, L., Wang, T. (2015). SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-21042-1_27
Published: 06 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21041-4
Online ISBN: 978-3-319-21042-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics