Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Framework-Based Scale-Out RDF Systems

  • Marcin WylotEmail author
  • Sherif SakrEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_225-1

Synonyms

Definition

RDF, the Resource Description Framework, has been recognized as a de facto standard to describe resources in a semi-structured manner. In particular, RDF is a graph-based format which allows to define named links between resources in the form of triples subject, predicate, object, also called statements. A statement expresses a relationship (defined by a predicate) between resources (subject and object). The relationship is always from subject to object (it is directional). The same resource can be used in multiple triples playing the same or different roles, e.g., it can be used as a subject in one triple, as well as a predicate or an object in another one. This ability enables definition of multiple connections between the triples, hence creation of a connected graph of data. Such graph can be represented as nodes that stands for the resources and edges capturing the relationships between the...

This is a preview of subscription content, log in to check access.

References

  1. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 411–422Google Scholar
  2. Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) Hadoopdb: an architectural hybrid of mapreduce and DBMS technologies for analytical workloads. PVLDB 2(1):922–933. http://www.vldb.org/pvldb/2/vldb09-861.pdf
  3. Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: SIGMOD. https://doi.org/10.1145/2723372.2742797
  4. Bernstein PA, Chiu DMW (1981) Using semi-joins to solve relational queries. J ACM (JACM) 28(1):25–40Google Scholar
  5. Chen X, Chen H, Zhang N, Zhang S (2014) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, 21 Oct 2014, pp 261–264. http://ceur-ws.org/Vol-1272/paper_43.pdf
  6. Chen X, Chen H, Zhang N, Zhang S (2015) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT 2015, Singapore, 6–9 Dec 2015, vol I, pp 292–300. https://doi.org/10.1109/WI-IAT.2015.186
  7. Dean J, Ghemawa S (2004) MapReduce: simplified data processing on large clusters. In: OSDIGoogle Scholar
  8. Djahandideh B, Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare in action: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 1432–1435.  https://doi.org/10.1109/ICDE.2015.7113394
  9. Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 771–782.  https://doi.org/10.1109/ICDE.2015.7113332
  10. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: OSDI. https:// www.usenix.org/conference/osdi14/technical-sessions/ presentation/gonzalez
  11. Goodman EL, Grunwald D (2014) Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: Proceedings of the 4th workshop on irregular applications: architectures and algorithms, IA3 ’14. IEEE Press, Piscataway, pp 25–32. https://doi.org/10.1109/IA3.2014.10
  12. Huang J, Abadi DJ, Ren K (2011a) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11): 1123–1134Google Scholar
  13. Huang J, Abadi DJ, Ren K (2011b) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134Google Scholar
  14. Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9):1312–1327Google Scholar
  15. Kim H, Ravindra P, Anyanwu K (2013) Optimizing RDF(S) queries on cloud platforms. In: 22nd international world wide web conference, WWW ’13, Rio de Janeiro, 13–17 May 2013, Companion volume, pp 261–264. http://dl.acm.org/citation.cfm?id=2487917
  16. Lee K, Liu L (2013) Scaling queries over big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905Google Scholar
  17. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphLab: a framework for machine learning in the cloud. PVLDB 5(8):716–727Google Scholar
  18. Naacke H, Curé O, Amann B (2016) SPARQL query processing with Apache spark. CoRR abs/1604.08903. http://arxiv.org/abs/1604.08903
  19. Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19(1): 91–113Google Scholar
  20. Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp 1099–1110. https://doi.org/10.1145/1376616.1376726
  21. Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. In: Spaccapietra S (ed) Journal on data semantics X. Springer, Berlin/Heidelberg, pp 133–173Google Scholar
  22. Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: The semanic web: research and applications – 8th extended semantic web conference, ESWC 2011, Heraklion, Crete, 29 May – 2 June 2011, Proceedings, Part II, pp 46–61. https://doi.org/10.1007/978-3-642-21064-8_4
  23. Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In: Programming support innovations for emerging distributed applications. ACM, p 4Google Scholar
  24. Sakr S (2016) Big data 2.0 processing systems – a survey. Springer briefs in computer science. Springer. https://doi.org/10.1007/978-3-319-38776-5
  25. Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1). https://doi.org/10.1145/2522968.2522979
  26. Schätzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, 23 Oct 2013, pp 241–244. http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf
  27. Schätzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015a) S2X: graph-parallel querying of RDF with GraphX. In: 1st international workshop on big-graphs online querying (Big-O(Q))Google Scholar
  28. Schätzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2015b) S2RDF: RDF querying with SPARQL on spark. CoRR abs/1512.07021. http://arxiv.org/abs/1512.07021
  29. Valduriez P (1987) Join indices. ACM Trans Database Syst 12(2):218–246. https://doi.org/10.1145/22952.22955
  30. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloudGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.TU Berlin/Fraunhofer FOKUSBerlinGermany
  2. 2.School of Computer Science and Engineering (CSE)University of New South WalesSydneyAustralia

Section editors and affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Olaf Hartig
    • 2
  1. 1.eXascale InfolabUniversity of FribourgFribourgSwitzerland
  2. 2.Linköping University