Abstract
Knowledge graphs (KGs) have emerged as a prominent data representation and management paradigm. Being usually underpinned by a schema (e.g., an ontology), KGs capture not only factual information but also contextual knowledge. In some tasks, a few KGs established themselves as standard benchmarks. However, recent works outline that relying on a limited collection of datasets is not sufficient to assess the generalization capability of an approach. In some data-sensitive fields such as education or medicine, access to public datasets is even more limited. To remedy the aforementioned issues, we release PyGraft, a Python-based tool that generates highly customized, domain-agnostic schemas and KGs. The synthesized schemas encompass various RDFS and OWL constructs, while the synthesized KGs emulate the characteristics and scale of real-world KGs. Logical consistency of the generated resources is ultimately ensured by running a description logic (DL) reasoner. By providing a way of generating both a schema and KG in a single pipeline, PyGraft’s aim is to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML), or more generally KG processing. In graph-based ML in particular, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks. PyGraft is available at: https://github.com/nicolas-hbt/pygraft.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
An instance triple should also be added. This is because some property combinations such as owl:SymmetricProperty and owl:AsymmetricProperty are not flagged as logically inconsistent per se in OWL. However, a relation qualified by these two properties is not allowed to connect any instances.
- 9.
- 10.
References
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). https://doi.org/10.1103/RevModPhys.74.47
Angles, R., et al.: The linked data benchmark council: a graph and RDF industry benchmarking effort. SIGMOD Rec. 43(1), 27–31 (2014). https://doi.org/10.1145/2627692.2627697
Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G.H.L., Lemay, A., Advokaat, N.: gMark: schema-driven generation of graphs and queries. IEEE Trans. Knowl. Data Eng. 29(4), 856–869 (2017). https://doi.org/10.1109/TKDE.2016.2633993
Bonatti, P.A., Decker, S., Polleres, A., Presutti, V.: Knowledge graphs: new directions for knowledge representation on the semantic web (Dagstuhl seminar 18371). Dagstuhl Rep. 8(9), 29–111 (2018). https://doi.org/10.4230/DagRep.8.9.29
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, 22–24 April 2004, pp. 442–446. SIAM (2004). https://doi.org/10.1137/1.9781611972740.43
d’Amato, C., Quatraro, N.F., Fanizzi, N.: Injecting background knowledge into embedding models for predictive tasks on knowledge graphs. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12731, pp. 441–457. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77385-4_26
De Cao, N., Kipf, T.: MolGAN: an implicit generative model for small molecular graphs. In: ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)
Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 1811–1818. AAAI Press (2018)
Ehrlinger, L., Wöß, W.: Towards a definition of knowledge graphs. In: Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16) co-located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016), Leipzig, Germany, 12–15 September 2016. CEUR Workshop Proceedings, vol. 1695. CEUR-WS.org (2016)
ERDdS, P., R &wi, A.: On random graphs I. Publ. math. debrecen 6(290-297), 18 (1959)
Feng, Z., et al.: A schema-driven synthetic knowledge graph generation approach with extended graph differential dependencies (gdd\(^{\text{ x }}\)s). IEEE Access 9, 5609–5639 (2021). https://doi.org/10.1109/ACCESS.2020.3048186
Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014). https://doi.org/10.1007/s10817-014-9305-1
Goyal, N., Jain, H.V., Ranu, S.: GraphGen: a scalable approach to domain-agnostic labeled graph generation. In: WWW 2020: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020, pp. 1253–1263. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3380201
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum Comput Stud. 43(5–6), 907–928 (1995). https://doi.org/10.1006/ijhc.1995.1081
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005). https://doi.org/10.1016/j.websem.2005.06.005
Hogan, A., et al.: Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers (2021). https://doi.org/10.2200/S01125ED1V01Y202109DSK022
Hubert, N., Monnin, P., Brun, A., Monticolo, D.: Enhancing knowledge graph embedding models with semantic-driven loss functions. CoRR abs/2303.00286 (2023). https://doi.org/10.48550/arXiv.2303.00286
Hubert, N., Monnin, P., Brun, A., Monticolo, D.: Sem@k: is my knowledge graph embedding model semantic-aware? CoRR abs/2301.05601 (2023). https://doi.org/10.48550/arXiv.2301.05601
Hubert, N., Paulheim, H., Monnin, P., Brun, A., Monticolo, D.: Schema first! learn versatile knowledge graph embeddings by capturing semantics with machine. CoRR abs/2306.03659 (2023). https://doi.org/10.48550/arXiv.2306.03659
Jain, N., Tran, T.K., Gad-Elrab, M.H., Stepanova, D.: Improving knowledge graph embeddings with ontological reasoning. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 410–426. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_24
Jin, L., Yao, Z., Chen, M., Chen, H., Zhang, W.: A comprehensive study on knowledge graph embedding over relational patterns based on rule learning (2023)
Liu, S., Grau, B.C., Horrocks, I., Kostylev, E.V.: Revisiting inferential benchmarks for knowledge graph completion. CoRR abs/2306.04814 (2023). https://doi.org/10.48550/arXiv.2306.04814
Melo, A., Paulheim, H.: Synthesizing knowledge graphs for link and type prediction benchmarking. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 136–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_9
Palowitch, J., Tsitsulin, A., Mayer, B., Perozzi, B.: GraphWorld: fake graphs bring real insights for GNNs. In: KDD 2022: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022, pp. 3691–3701. ACM (2022). https://doi.org/10.1145/3534678.3539203
Park, H., Kim, M.: Trilliong: a trillion-scale synthetic graph generator using a recursive vector model. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, 14–19 May 2017, pp. 913–928. ACM (2017). https://doi.org/10.1145/3035918.3064014
Portisch, J., Paulheim, H.: The DLCC node classification benchmark for analyzing knowledge graph embeddings. In: Sattler, U., et al. (eds.) ISWC 2022. LNCS, vol. 13489, pp. 592–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19433-7_34
Rossi, A., Firmani, D., Merialdo, P., et al.: Knowledge graph embeddings or bias graph embeddings? A study of bias in link prediction models. In: CEUR Workshop Proceedings, vol. 3034. CEUR-WS (2021)
Rossi, A., Matinata, A.: Knowledge graph embeddings: are relation-learning models learning relations? In: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 30 March 2020. CEUR Workshop Proceedings, vol. 2578. CEUR-WS.org (2020)
Samanta, B., et al.: NEVAE: a deep generative model for molecular graphs. J. Mach. Learn. Res. 21, 114:1–114:33 (2020)
Simonovsky, M., Komodakis, N.: GraphVAE: towards generation of small graphs using variational autoencoders. In: Kurkova, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 412–422. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_41
Wang, H., et al.: GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 2508–2515. AAAI Press (2018)
You, J., Ying, R., Ren, X., Hamilton, W.L., Leskovec, J.: GraphRNN: generating realistic graphs with deep auto-regressive models. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 5694–5703. PMLR (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hubert, N., Monnin, P., d’Aquin, M., Monticolo, D., Brun, A. (2024). PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14665. Springer, Cham. https://doi.org/10.1007/978-3-031-60635-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-60635-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60634-2
Online ISBN: 978-3-031-60635-9
eBook Packages: Computer ScienceComputer Science (R0)