Abstract
The Resource Description Framework (RDF) has become a very popular graph-based standard initially designed to represent information on the Web. Its flexibility motivated the use of this standard in other domains and today RDF datasets are big sources of information. In this line, the research on scalable distributed and parallel RDF processing systems has gained momentum. Most of these systems apply partitioning algorithms that use the triple, the finest logical data structure in RDF, as a distribution unit. This merely physical strategy implies losing the graph structure of the model causing performance degradation. We believe that gathering the triples storing the same logical entities first contributes not only to avoid scanning irrelevant data but also to create RDF partitions with an actual logical meaning. Besides, this logical representation allows defining partitions with a declarative language leaving aside implementation details. In this study, we give the formal definition and detail the algorithms to gather the logical entities, which we name graph fragments (\(\mathcal {G}f\)), used as distribution units for RDF datasets. The logical entities proposed, harmonize with the notion of partitions by instances (horizontal) and by attributes (vertical) in the relational model. We propose allocation strategies for these fragments, considering the case when replication is available and in which both fragments by instances and by attributes are considered. We also discuss how to incorporate our declarative partitioning definition language to the existing state of the art systems. Our experiments in synthetic and real datasets show that graph fragments avert data skewness. In addition, we show that this type of data organization exhibits quantitative promise in certain types of queries. All of the above techniques are integrated into the same framework that we called RDFPartSuite.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the term distributed RDF system indistinctly to denote both parallel and distributed RDF systems.
- 2.
According to the correctness fragmentation rules in [6].
References
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th ICDE, Hannover, Germany, 11–16 April, pp. 984–994 (2011)
Agrawal, S., Narasayya, V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of SIGMOD, Paris, France, 13–18 June, pp. 359–370 (2004)
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: Proceedings of 28th VLDB 2002, Hong Kong, China, 20–23 August, pp. 430–441 (2002)
Galicia, J., Mesmoudi, A., Bellatreche, L.: A logic dimension on RDF partitioning, Technical report (2019)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-8834-8
Du, J.-H., Wang, H.-F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. In: Huang, D.-S., Ma, J., Jo, K.-H., Gromiha, M.M. (eds.) ICIC 2012. LNCS (LNAI), vol. 7390, pp. 633–641. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31576-3_80
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)
Zeng, L., Zou, L.: Redesign of the gstore system. Front. Comput. Sci. 12(4), 623–641 (2018)
Mesmoudi, A.: Declarative parallel query processing on large scale astronomical databases. Ph.D. thesis, Doctoral School in Computer Science and Mathematics, Lyon, France (2015)
Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2018)
Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: 29th IEEE ICDE Brisbane, Australia, 8–12 April, pp. 565–576 (2013)
Neumann, T., Weikum, G.: RDF-3X: a risc-style engine for RDF. PVLDB 1(1), 647–659 (2008)
Wilkinson, K.: Jena property table implementation (2006)
Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gStore: a graph-based SPARQL query engine. VLDB J. 23(4), 565–590 (2014)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: high-performance distributed joins over large-scale RDF graphs. In: Proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, CA, USA, 6–9 October, pp. 255–263 (2013)
Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013)
Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Workshops Proceedings of the 29th IEEE ICDE 2013, Brisbane, Australia, 8–12 April, pp. 1–6 (2013)
Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., Ngomo, A.N.: A fine-grained evaluation of SPARQL endpoint federation systems. Semant. Web 7(5), 493–518 (2016)
Janke, D., Staab, S., Thimm, M.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. J. Web Semant. 50, 21–48 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Galicia, J., Mesmoudi, A., Bellatreche, L. (2019). RDFPartSuite: Bridging Physical and Logical RDF Partitioning. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-27520-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27519-8
Online ISBN: 978-3-030-27520-4
eBook Packages: Computer ScienceComputer Science (R0)