Skip to main content

RDFPartSuite: Bridging Physical and Logical RDF Partitioning

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11708))

Included in the following conference series:

Abstract

The Resource Description Framework (RDF) has become a very popular graph-based standard initially designed to represent information on the Web. Its flexibility motivated the use of this standard in other domains and today RDF datasets are big sources of information. In this line, the research on scalable distributed and parallel RDF processing systems has gained momentum. Most of these systems apply partitioning algorithms that use the triple, the finest logical data structure in RDF, as a distribution unit. This merely physical strategy implies losing the graph structure of the model causing performance degradation. We believe that gathering the triples storing the same logical entities first contributes not only to avoid scanning irrelevant data but also to create RDF partitions with an actual logical meaning. Besides, this logical representation allows defining partitions with a declarative language leaving aside implementation details. In this study, we give the formal definition and detail the algorithms to gather the logical entities, which we name graph fragments (\(\mathcal {G}f\)), used as distribution units for RDF datasets. The logical entities proposed, harmonize with the notion of partitions by instances (horizontal) and by attributes (vertical) in the relational model. We propose allocation strategies for these fragments, considering the case when replication is available and in which both fragments by instances and by attributes are considered. We also discuss how to incorporate our declarative partitioning definition language to the existing state of the art systems. Our experiments in synthetic and real datasets show that graph fragments avert data skewness. In addition, we show that this type of data organization exhibits quantitative promise in certain types of queries. All of the above techniques are integrated into the same framework that we called RDFPartSuite.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the term distributed RDF system indistinctly to denote both parallel and distributed RDF systems.

  2. 2.

    According to the correctness fragmentation rules in [6].

References

  1. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th ICDE, Hannover, Germany, 11–16 April, pp. 984–994 (2011)

    Google Scholar 

  2. Agrawal, S., Narasayya, V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of SIGMOD, Paris, France, 13–18 June, pp. 359–370 (2004)

    Google Scholar 

  3. Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: Proceedings of 28th VLDB 2002, Hong Kong, China, 20–23 August, pp. 430–441 (2002)

    Google Scholar 

  4. Galicia, J., Mesmoudi, A., Bellatreche, L.: A logic dimension on RDF partitioning, Technical report (2019)

    Google Scholar 

  5. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  6. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-8834-8

    Book  Google Scholar 

  7. Du, J.-H., Wang, H.-F., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. In: Huang, D.-S., Ma, J., Jo, K.-H., Gromiha, M.M. (eds.) ICIC 2012. LNCS (LNAI), vol. 7390, pp. 633–641. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31576-3_80

    Chapter  Google Scholar 

  8. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)

    Article  Google Scholar 

  9. Zeng, L., Zou, L.: Redesign of the gstore system. Front. Comput. Sci. 12(4), 623–641 (2018)

    Article  Google Scholar 

  10. Mesmoudi, A.: Declarative parallel query processing on large scale astronomical databases. Ph.D. thesis, Doctoral School in Computer Science and Mathematics, Lyon, France (2015)

    Google Scholar 

  11. Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)

    Google Scholar 

  12. Gurobi Optimization, LLC: Gurobi optimizer reference manual (2018)

    Google Scholar 

  13. Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: 29th IEEE ICDE Brisbane, Australia, 8–12 April, pp. 565–576 (2013)

    Google Scholar 

  14. Neumann, T., Weikum, G.: RDF-3X: a risc-style engine for RDF. PVLDB 1(1), 647–659 (2008)

    Google Scholar 

  15. Wilkinson, K.: Jena property table implementation (2006)

    Google Scholar 

  16. Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D.: gStore: a graph-based SPARQL query engine. VLDB J. 23(4), 565–590 (2014)

    Article  Google Scholar 

  17. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)

    Google Scholar 

  18. Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: high-performance distributed joins over large-scale RDF graphs. In: Proceedings of the 2013 IEEE International Conference on Big Data, Santa Clara, CA, USA, 6–9 October, pp. 255–263 (2013)

    Google Scholar 

  19. Lee, K., Liu, L.: Scaling queries over big RDF graphs with semantic hash partitioning. PVLDB 6(14), 1894–1905 (2013)

    Google Scholar 

  20. Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: Workshops Proceedings of the 29th IEEE ICDE 2013, Brisbane, Australia, 8–12 April, pp. 1–6 (2013)

    Google Scholar 

  21. Saleem, M., Khan, Y., Hasnain, A., Ermilov, I., Ngomo, A.N.: A fine-grained evaluation of SPARQL endpoint federation systems. Semant. Web 7(5), 493–518 (2016)

    Article  Google Scholar 

  22. Janke, D., Staab, S., Thimm, M.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. J. Web Semant. 50, 21–48 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Galicia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Galicia, J., Mesmoudi, A., Bellatreche, L. (2019). RDFPartSuite: Bridging Physical and Logical RDF Partitioning. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27520-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27519-8

  • Online ISBN: 978-3-030-27520-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics