Skip to main content

SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9098))

Abstract

Considering the scalability and semantic requirements, Resource Description Framework (RDF) and the de-facto query language SPARQL are well suited for managing and querying online social network (OSN) data. Despite some existing works have introduced distributed framework for querying large-scale data, how to improve online query performance is still a challenging task. To address this problem, this paper proposes a scalable RDF data framework, which uses key-value store for offline RDF storage and pipelined in-memory based query strategy. The proposed framework efficiently supports SPARQL Basic Graph Pattern (BGP) queries on large-scale datasets. Experiments on the benchmark dataset demonstrate that the online SPARQL query performance of our framework outperforms existing distributed RDF solutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Semantic Web. http://www.w3.org/standards/semanticweb/

  2. FOAF-project. http://www.foaf-project.org/

  3. SIOC project. http://rdfs.org/sioc/spec/

  4. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/

  5. Neumann, T., Weikum, G.: RDF-3X: A RISC-Style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)

    Article  Google Scholar 

  6. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. In: PVLDB, pp. 1008–1019 (2008)

    Google Scholar 

  7. Sesame. http://www.openrdf.org

  8. Husain, M., McGlothlin, J., Masud, M., Khan, L., Thuraisingham, B.: Heuristics-Based Querying Processing for Large RDF Graphs Using Cloud Computing. IEEE Transactions on Knowledge and Data Engineering 23, 1312–1327 (2011)

    Article  Google Scholar 

  9. Myung, J., Yeon, J., Lee, S.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of MDAC, pp. 6:1–6:6 (2010)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on OSDI, vol. 6, p. 10 (2004)

    Google Scholar 

  11. Kellerman, J.: HBase: Structured storage of sparse data for hadoop (2009). http://hbase.apache.org/

  12. Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)

    Google Scholar 

  13. Hadoop. http://hadoop.apache.org/

  14. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)

    Google Scholar 

  15. Jena. https://jena.apache.org/

  16. Atre, M., Srinivasan, J., Hendler, J.: BitMat: a main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In: ISWC (2008)

    Google Scholar 

  17. Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. In: Semantic Web Information Management, pp. 501–519 (2009)

    Google Scholar 

  18. Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2RDF: adaptive query processing on RDF data in the cloud. In: Proc. of WWW, pp. 397–400 (2012)

    Google Scholar 

  19. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB, pp. 265–276. VLDB Endowment (2013)

    Google Scholar 

  20. Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on OSDI, pp. 305–314 (2006)

    Google Scholar 

  21. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Semantics 3, 158–182 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, Z., Chen, W., Gai, L., Wang, T. (2015). SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data. In: Dong, X., Yu, X., Li, J., Sun, Y. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9098. Springer, Cham. https://doi.org/10.1007/978-3-319-21042-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21042-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21041-4

  • Online ISBN: 978-3-319-21042-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics