In-memory parallelization of join queries over large ontological hierarchies

Bilidas, Dimitris; Koubarakis, Manolis

doi:10.1007/s10619-020-07305-y

In-memory parallelization of join queries over large ontological hierarchies

Published: 29 June 2020

Volume 39, pages 545–582, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Dimitris Bilidas¹ &
Manolis Koubarakis¹

316 Accesses
4 Citations
Explore all metrics

Abstract

The Resource Description Framework (RDF) data model enables the construction of knowledge graphs over various domains, using ontologies in order to encode information about the domain, and simple statements in the form of subject-predicate-object triples for data representation, facilitating the interlinking and exchange of Web data. However, this simplicity comes with the cost of having to execute a large number of joins in order to get the desirable query results, while at the same time large ontological hierarchies complicate the query answering process even more, for systems that provide complete answers with respect to such ontological axioms. In this work we present PARJ, an in-memory RDF store which takes into consideration ontological hierarchies during join processing with very low performance overhead, avoiding expensive preprocessing and materialization of implications, and is also amenable to straightforward parallelization. Specifically, we present a join implementation that allows to achieve any desired degree of parallelism on arbitrary join queries and RDF graphs stored in memory using compact vertical partitioning. We use an adaptive join processing approach, such that we take advantage of complete or even partial ordering of RDF data, which is compactly stored in order to increase spatial locality and keep memory consumption low, coupled with an ID-to-Position vector index used when ordering does not allow for efficient scanning of the input relation. Finally, we experimentally show the efficiency and scalability of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Universal Storage Adaption for Distributed RDF-Triple Stores

Document Based RDF Storage Method for Efficient Parallel Query Processing

Efficient Parallel Processing of Analytical Queries on Linked Data

Notes

https://www.w3.org/TR/rdf11-concepts/.
http://hbase.apache.org/.
http://accumulo.apache.org/.
https://www.stardog.com.
http://graphdb.ontotext.com/.
https://www.w3.org/TR/r2rml/.
https://github.com/dbilid/experiments.
This was verified with the TriAD implementors.
https://github.com/dbilid/PARJ-Ontop.

References

Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23–27, 2007, pp. 411–422 (2007)
Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)
Google Scholar
Al-Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J. 25(3), 355–380 (2016)
Article Google Scholar
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064–1075 (2012)
Article Google Scholar
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICS-FORTH RDFSuite: managing voluminous RDF description bases. In: SemWeb (2001)
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: The Semantic Web—ISWC 2014—13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part I, pp. 197–212 (2014)
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD Conference, pp. 1383–1394. ACM (2015)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, pp. 722–735 (2007)
Bilidas, D., Koubarakis, M.: Scalable parallelization of RDF joins on multicore architectures. In: Advances in Database Technology—22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26–29, 2019, pp. 349–360 (2019). https://doi.org/10.5441/002/edbt.2019.31
Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: Statistics-oblivious access paths. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 315–326. IEEE (2015)
Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: robust access path selection without cardinality estimation. VLDB J. 1–25 (2018)
Bursztyn, D., Goasdoué, F., Manolescu, I.: Teaching an RDBMS about ontological constraints. Proc. VLDB Endow. 9(12), 1161–1172 (2016)
Article Google Scholar
Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez-Muro, M., Xiao, G.: Ontop: answering SPARQL queries over relational databases. Semant. Web 8(3), 471–487 (2017)
Article Google Scholar
Chortaras, A., Trivela, D., Stamou, G.: Optimized query rewriting for OWL 2 QL. In: International Conference on Automated Deduction, pp. 192–206. Springer (2011)
Du, J., Wang, H., Ni, Y., Yu, Y.: HadoopRDF: a scalable semantic data analytical engine. In: Intelligent Computing Theories and Applications—8th International Conference, ICIC 2012, Huangshan, China, July 25–29, 2012. Proceedings, pp. 633–641 (2012)
Groppe, J., Groppe, S.: Parallelizing join computations of SPARQL queries for large semantic web databases. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 1681–1686. ACM (2011)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)
Article Google Scholar
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, pp. 289–300 (2014)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet Google Scholar
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: Monetdb: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Google Scholar
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Article Google Scholar
Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., et al.: Ontology based data access in statoil. J. Web Semant. 44, 3–36 (2017)
Article Google Scholar
Kikot, S., Kontchakov, R., Zakharyaschev, M.: Conjunctive query answering with OWL 2 QL. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow. 2(2), 1378–1389 (2009)
Article Google Scholar
Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to ontology-based data access. In: Twenty-second international joint conference on artificial intelligence (2011)
Luo, Y., Picalausa, F., Fletcher, G.H., Hidders, J., Vansummeren, S.: Storing and indexing massive RDF datasets. In: Semantic search over the web, pp. 31–60. Springer (2012)
Lutz, C., Seylan, I., Toman, D., Wolter, F.: The combined approach to OBDA: taming role hierarchies using filters. In: International semantic web conference, pp. 314–330. Springer (2013)
Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)
Article Google Scholar
Manegold, S., Boncz, P., Kersten, M.L.: Generic database cost models for hierarchical memory systems. In: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, pp. 191–202. Elsevier (2002)
Mora, J., Corcho, Ó.: Engineering optimisations in query rewriting for OBDA. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 41–48. ACM (2013)
Myung, J., Yeon, J., Lee, S.g.: Sparql basic graph pattern processing with iterative mapreduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud. ACM (2010)
Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: RDFox: a highly-scalable RDF store. In: International Semantic Web Conference, pp. 3–20. Springer (2015)
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994. IEEE Computer Society (2011)
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: SIGMOD Conference, pp. 627–640. ACM (2009)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110. ACM (2008)
Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H\({}_{{2}}\)RDF+: an efficient data management system for big RDF graphs. In: SIGMOD Conference, pp. 909–912. ACM (2014)
Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Journal on data semantics X, pp. 133–173. Springer (2008)
Potter, A., Motik, B., Nenov, Y., Horrocks, I.: Distributed RDF query answering with dynamic data exchange. In: International Semantic Web Conference (1), Lecture Notes in Computer Science, vol. 9981, pp. 480–497 (2016)
Punnoose, R., Crainiceanu, A., Rapp, D.: SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Article Google Scholar
Qin, W., Idreos, S.: Adaptive data skipping in main-memory systems. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2255–2256. ACM (2016)
Ravindra, P., Kim, H., Anyanwu, K.: An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: ESWC (2), Lecture Notes in Computer Science, vol. 6644, pp. 46–61. Springer (2011)
Rodriguez-Muro, M., Kontchakov, R., Zakharyaschev, M.: Ontology-based data access: ontop of databases. In: International Semantic Web Conference, pp. 558–573. Springer (2013)
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: PSI EtA, p. 4. ACM (2010)
Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: DICT@HPDC, pp. 35–44. ACM (2011)
Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In: Twelfth International Conference on the Principles of Knowledge Representation and Reasoning (2010)
Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: mapping SPARQL to Pig Latin. In: SWIM, p. 4. ACM (2011)
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. PVLDB 9(10), 804–815 (2016)
Google Scholar
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Semantic Web Conference (1), Lecture Notes in Computer Science, vol. 8796, pp. 245–260. Springer (2014)
Ślezak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. Proc. VLDB Endow. 1(2), 1337–1345 (2008)
Article Google Scholar
Stefanoni, G., Motik, B., Kostylev, E.V.: Estimating the cardinality of conjunctive queries over RDF data using graph summarisation. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 1043–1052. International World Wide Web Conferences Steering Committee (2018)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: VLDB, pp. 553–564. ACM (2005)
Subercaze, J., Gravier, C., Chevalier, J., Laforest, F.: Inferray: fast in-memory RDF inference. Proc. VLDB Endow. 9(6), 468–479 (2016)
Article Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in jena2. In: SWDB, pp. 131–150 (2003)
Xiao, G., Hovland, D., Bilidas, D., Rezk, M., Giese, M., Calvanese, D.: Efficient ontology-based data integration with canonical IRIs. In: European Semantic Web Conference, pp. 697–713. Springer (2018)
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: Triplebit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)
Article Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. PVLDB 6(4), 265–276 (2013)
Google Scholar

Download references

Acknowledgements

The present work was funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 825258.

Author information

Authors and Affiliations

National and Kapodistrian University of Athens, Athens, Greece
Dimitris Bilidas & Manolis Koubarakis

Authors

Dimitris Bilidas
View author publications
You can also search for this author in PubMed Google Scholar
Manolis Koubarakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Bilidas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bilidas, D., Koubarakis, M. In-memory parallelization of join queries over large ontological hierarchies. Distrib Parallel Databases 39, 545–582 (2021). https://doi.org/10.1007/s10619-020-07305-y

Download citation

Published: 29 June 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10619-020-07305-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-memory parallelization of join queries over large ontological hierarchies

Abstract

Access this article

Similar content being viewed by others

Universal Storage Adaption for Distributed RDF-Triple Stores

Document Based RDF Storage Method for Efficient Parallel Query Processing

Efficient Parallel Processing of Analytical Queries on Linked Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

In-memory parallelization of join queries over large ontological hierarchies

Abstract

Access this article

Similar content being viewed by others

Universal Storage Adaption for Distributed RDF-Triple Stores

Document Based RDF Storage Method for Efficient Parallel Query Processing

Efficient Parallel Processing of Analytical Queries on Linked Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation