Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

Khelil, Abdallah; Mesmoudi, Amin; Galicia, Jorge; Bellatreche, Ladjel; Hacid, Mohand-Saïd; Coquery, Emmanuel

doi:10.1007/s10796-020-09998-z

Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

Published: 04 March 2020

Volume 23, pages 165–183, (2021)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Abdallah Khelil^1,2,
Amin Mesmoudi ORCID: orcid.org/0000-0003-1307-591X³,
Jorge Galicia¹,
Ladjel Bellatreche¹,
Mohand-Saïd Hacid⁴ &
…
Emmanuel Coquery⁴

335 Accesses
4 Citations
Explore all metrics

Abstract

The flexibility offered by the Resource Description Framework (RDF) has led it to become a very popular standard for representing data with an undefined or variable schema using the concept of triples. Its success has resulted in many large scale multidisciplinary datasets, that have prompted the development of efficient RDF processing systems. Current approaches can be distinguished into two groups: the first, adopting the relational model storing the triples in tables, and the second creating data structures that model RDF data as a graph. The strategies of the first group are more easily scalable since they apply optimization strategies from the relational model like indexing and fragmentation. However, these approaches suffer many overheads when dealing with complex queries (e.g. compounded SPARQL graphs involving filters) persistent in existing applications. On the other hand, graph-based systems that use more complex data structures fail to efficiently manage the main memory and are not scalable in computer hardware with limited resources. In this paper, we propose a novel approach to perform queries (Basic Graph Patterns, Wildcards, Aggregations and Sorting) on RDF data. We propose to combine both RDF graph exploration with physical fragmentation of triples. In this work, we describe our graph-based storage and query evaluation models. Then, we detail the architecture of our system and we largely explain the strategy, based in the Volcano execution model, used to manage the main memory at query runtime. We conducted extensive experiments on synthetic and real datasets to evaluate the efficiency of our proposal. We compared our performance with a relational-based (Virtuoso), a graph-based (gStore) and an intensive-indexing (RDF-3X) approach. According to our evaluation, our system offers the best compromise between efficient query processing and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2

Big data analytics on Apache Spark

Article 13 October 2016

Salman Salloum, Ruslan Dautov, … Joshua Zhexue Huang

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Antonios Makris, Konstantinos Tserpes, … Dimosthenis Anagnostopoulos

A survey on visualization approaches for exploring association relationships in graph data

Article 02 April 2019

Yi Chen, Zeli Guan, … Yunhai Wang

Notes

https://github.com/bio2rdf/bio2rdf-scripts/wiki
http://wiki.dbpedia.org
Similar to SQL queries with Wildcards characters
Subject Predicate Object
Queries with variable predicates can be answered by query rewriting
https://hadoop.apache.org
https://hbase.apache.org/
In the rest of this paper we use the word graph fragment instead of characteristic sets to design the physical split of SPO or OPS.
The set of predicates related to subject (in the case of SPO fragment) or objects (in the case of OPS fragment
ϕ is used to denote an empty element
http://graphdb.ontotext.com/
https://github.com/pkumod/gStore
https://github.com/openlink/virtuoso-opensource
Queries list: https://www.lias-lab.fr/~amesmoudi/papers/ISF2020/Queries.pdf

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international conference on very large data bases (pp. 411–422): VLDB Endowment.
Aït-Kaci, H., Boyer, R., Lincoln, P , Nasr, R. (1989). Efficient implementation of lattice operations. ACM Transactions on Programming Languages and Systems (TOPLAS), 11(1), 115–146.
Article Google Scholar
Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M. (2016). Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB Journal, 25(3), 355–380.
Article Google Scholar
Atre, M., Srinivasan, J., Hendler, Bitmat. (2008). Bitmat: a main-memory bit matrix of RDF triples for conjunctive triple pattern queries. In Proceedings of the poster and demonstration session at the 7th international semantic web conference (ISWC2008), Karlsruhe, Germany, October 28.
Briggs, M. (2012). Db2 nosql graph store what why & overview.
Broekstra, J., Kampman, A., van Harmelen, F. (2002). Sesame: a generic architecture for storing and querying RDF and RDF schema. In The semantic web - ISWC, first international semantic web conference, Italy, June 9-12 (pp. 54–68).
Cyganiak, R. (2005). A relational algebra for sparql. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, p. 35.
Deppisch, U. (1986). S-tree: a dynamic balanced signature index for office retrieval. In Proceedings of the 9th annual international ACM SIGIR conference on research and development in information retrieval (pp. 77–87): ACM.
Du, J., Wang, H., Ni, Y., Hadooprdf, Y.Yu. (2012). A scalable semantic data analytical engine. In Intelligent computing theories and applications - 8th international conference, ICIC, China, July 25-29 (pp. 633–641).
Erling, O. (2012). Virtuoso, a hybrid rdbms/graph column store. IEEE Data Engineering Bulletin, 35(1), 3–8.
Google Scholar
Fuentes-Lorenzo, D., Morato, J., Gómez, J.M. (2009). Knowledge management in biomedical libraries: a semantic web approach. Information Systems Frontiers, 11(4), 471–480.
Article Google Scholar
Galicia, J., Mesmoudi, A., Bellatreche, L. (2019). Rdfpartsuite: bridging physical and logical RDF partitioning. In Big data analytics and knowledge discovery - 21st international conference, DaWaK 2019, Linz, Austria, August 26-29, 2019, Proceedings (pp. 136–150).
Görlitz, O., & Staab, S. (2011). SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the second international workshop on consuming linked data, Bonn, Germany, October 23.
Graefe. G. (1994). Volcano - an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1), 120–135.
Article Google Scholar
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M. (2014). Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In SIGMOD, USA, June 22-27 (pp. 289–300).
Huang, J., Abadi, D.J., Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. PVLDB, 4 (11), 1123–1134.
Google Scholar
Janik, M., & Kochut, K. (2005). BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery. In The semantic web - ISWC 2005, 4th international semantic web conference, ISWC, Galway, Ireland, November 6-10, 2005, Proceedings (pp. 431–445).
Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal of Scientific Computing, 20(1), 359–392.
Article Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al. (2015). Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2), 167–195.
Article Google Scholar
McBride, B. (2002). Jena: a semantic web toolkit. IEEE Internet Computing, 6, 55–59.
Article Google Scholar
Mouzakitis, S., Papaspyros, D., Petychakis, M., Koussouris, S., Zafeiropoulos, A., Fotopoulou, E., Farid, L., Orlandi, F., Attard, J., Psarras, J. (2017). Challenges and opportunities in renovating public sector information by enabling linked data and analytics. Information Systems Frontiers, 19(2), 321–336.
Article Google Scholar
Neumann, T., & Moerkotte, G. (2011). Characteristic sets: accurate cardinality estimation for rdf queries with multiple joins. In Data Engineering (ICDE) (pp. 984–994).
Neumann, T., & Weikum, G. (2008). Rdf-3x: a risc-style engine for rdf. Proceedings of the VLDB Endowment, 1(1), 647–659.
Article Google Scholar
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N. (2013). H2RDF+: high-performance distributed joins over large-scale RDF graphs. In Proceedings of the 2013 IEEE international conference on big data (pp. 255–263). USA.
Peng, P., Zou, L., Özsu, M.T., Chen, L., Zhao, D. (2016). Processing SPARQL queries over distributed RDF graphs. VLDB Journal, 25(2), 243–268.
Article Google Scholar
Pérez, J., Arenas, M., Gutierrez, C. (2006). Semantics and complexity of sparql. In International Semantic Web Conference, (Vol. 4273 pp. 30–43): Springer.
Rohloff, K., & Schantz, R.E. (2011). Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In DIDC’11, Proceedings of the fourth international workshop on data-intensive distributed computing (pp. 35–44). San Jose.
Saleem, M., & Ngomo, A.N. (2014). Hibiscus: hypergraph-based source selection for SPARQL endpoint federation. In The semantic web: trends and challenges - 11th international conference, ESWC, Anissaras, Crete, Greece, May 25-29, 2014. Proceedings (pp. 176–191).
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G. (2011). Pigsparql: mapping SPARQL to pig latin. In Proceedings of the international workshop on semantic web information management, SWIM (p. 4). Greece.
Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G. (2015). S2X: graph-parallel querying of RDF with graphx. In Biomedical data management and graph online querying - VLDB 2015 workshops (pp. 155–168).
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G. (2016). S2RDF: RDF querying with SPARQL on spark. PVLDB, 9(10), 804–815.
Google Scholar
Stephan, E.G., Elsethagen, T., Berg, L.K., Macduff, M.C., Paulson, P.R., Shaw, W.J., Sivaraman, C., Smith, W., Wynne, A. (2016). Semantic catalog of things, services, and data to support a wind data management facility. Information Systems Frontiers, 18(4), 679–691.
Article Google Scholar
Udrea, O., Pugliese, A., Subrahmanian, V.S. (2007). GRIN: a graph based RDF index. In Proceedings of the twenty-second AAAI conference on artificial intelligence, July 22-26, Vancouver, British Columbia, Canada (pp. 1465–1470).
W3C. (2014). Rdf 1.1 concepts and abstract syntax. https://www.w3.org/TR/rdf11-concepts/, https://www.w3.org/TR/rdf-sparql-query/.
Weiss, C., Karras, P., Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proceedings of VLDB, 1(1), 1008–1019.
Article Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D. (2003). Efficient RDF storage and retrieval in jena2. In Proceedings of SWDB’03, the first international workshop on semantic web and databases, co-located with VLDB 2003, Humboldt-Universitȧt, Berlin, Germany, September 7-8 (pp. 131–150).
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z. (2013). A distributed graph engine for web scale RDF data. PVLDB, 6(4), 265–276.
Google Scholar
Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D. (2011). gstore: answering sparql queries via subgraph matching. Proceedings of the VLDB Endowment, 4(8), 482–493.
Article Google Scholar
Zou, L., Özsu, M.T., Chen, L., Shen, X., Huang, R., Zhao, D. (2014). gstore: a graph-based SPARQL query engine. VLDB Journal, 23(4), 565–590.
Article Google Scholar
Zouaghi, I., Mesmoudi, A., Galicia, J., Bellatreche, L., Aguili, T. (2020). Query optimization for large scale clustered rdf data. In 22nd international workshop on design, optimization, languages and analytical processing of big data, March 30, 2020. Copenhagen.

Download references

Author information

Authors and Affiliations

LIAS/ISAE-ENSMA, 86360, Chasseneuil-du-Poitou, France
Abdallah Khelil, Jorge Galicia & Ladjel Bellatreche
LAPECI/Université Oran 1, Oran, Algeria
Abdallah Khelil
LIAS/University of Poitiers, 86360, Chasseneuil-du-Poitou, France
Amin Mesmoudi
LIRIS/University of Lyon, 69621, Villeurbanne, France
Mohand-Saïd Hacid & Emmanuel Coquery

Authors

Abdallah Khelil
View author publications
You can also search for this author in PubMed Google Scholar
Amin Mesmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Galicia
View author publications
You can also search for this author in PubMed Google Scholar
Ladjel Bellatreche
View author publications
You can also search for this author in PubMed Google Scholar
Mohand-Saïd Hacid
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Coquery
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Mesmoudi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khelil, A., Mesmoudi, A., Galicia, J. et al. Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing. Inf Syst Front 23, 165–183 (2021). https://doi.org/10.1007/s10796-020-09998-z

Download citation

Published: 04 March 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10796-020-09998-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A survey on visualization approaches for exploring association relationships in graph data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining Graph Exploration and Fragmentation for Scalable RDF Query Processing

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A survey on visualization approaches for exploring association relationships in graph data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation