Skip to main content

PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2024)

Abstract

Knowledge graphs (KGs) have emerged as a prominent data representation and management paradigm. Being usually underpinned by a schema (e.g., an ontology), KGs capture not only factual information but also contextual knowledge. In some tasks, a few KGs established themselves as standard benchmarks. However, recent works outline that relying on a limited collection of datasets is not sufficient to assess the generalization capability of an approach. In some data-sensitive fields such as education or medicine, access to public datasets is even more limited. To remedy the aforementioned issues, we release PyGraft, a Python-based tool that generates highly customized, domain-agnostic schemas and KGs. The synthesized schemas encompass various RDFS and OWL constructs, while the synthesized KGs emulate the characteristics and scale of real-world KGs. Logical consistency of the generated resources is ultimately ensured by running a description logic (DL) reasoner. By providing a way of generating both a schema and KG in a single pipeline, PyGraft’s aim is to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML), or more generally KG processing. In graph-based ML in particular, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks. PyGraft is available at: https://github.com/nicolas-hbt/pygraft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/igraph/python-igraph/.

  2. 2.

    https://github.com/networkx/networkx/.

  3. 3.

    https://github.com/snap-stanford/snap-python/.

  4. 4.

    https://www.w3.org/RDFS/.

  5. 5.

    https://www.w3.org/OWL/.

  6. 6.

    https://github.com/pwin/owlready2/.

  7. 7.

    https://github.com/RDFLib/rdflib/.

  8. 8.

    An instance triple should also be added. This is because some property combinations such as owl:SymmetricProperty and owl:AsymmetricProperty are not flagged as logically inconsistent per se in OWL. However, a relation qualified by these two properties is not allowed to connect any instances.

  9. 9.

    https://www.w3.org/TR/owl2-profiles/#ref-owl-2-rdf-semantics/.

  10. 10.

    https://oops.linkeddata.es/.

References

  1. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). https://doi.org/10.1103/RevModPhys.74.47

    Article  MathSciNet  Google Scholar 

  2. Angles, R., et al.: The linked data benchmark council: a graph and RDF industry benchmarking effort. SIGMOD Rec. 43(1), 27–31 (2014). https://doi.org/10.1145/2627692.2627697

    Article  Google Scholar 

  3. Bagan, G., Bonifati, A., Ciucanu, R., Fletcher, G.H.L., Lemay, A., Advokaat, N.: gMark: schema-driven generation of graphs and queries. IEEE Trans. Knowl. Data Eng. 29(4), 856–869 (2017). https://doi.org/10.1109/TKDE.2016.2633993

    Article  Google Scholar 

  4. Bonatti, P.A., Decker, S., Polleres, A., Presutti, V.: Knowledge graphs: new directions for knowledge representation on the semantic web (Dagstuhl seminar 18371). Dagstuhl Rep. 8(9), 29–111 (2018). https://doi.org/10.4230/DagRep.8.9.29

    Article  Google Scholar 

  5. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, 22–24 April 2004, pp. 442–446. SIAM (2004). https://doi.org/10.1137/1.9781611972740.43

  6. d’Amato, C., Quatraro, N.F., Fanizzi, N.: Injecting background knowledge into embedding models for predictive tasks on knowledge graphs. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12731, pp. 441–457. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77385-4_26

    Chapter  Google Scholar 

  7. De Cao, N., Kipf, T.: MolGAN: an implicit generative model for small molecular graphs. In: ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)

    Google Scholar 

  8. Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2D knowledge graph embeddings. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 1811–1818. AAAI Press (2018)

    Google Scholar 

  9. Ehrlinger, L., Wöß, W.: Towards a definition of knowledge graphs. In: Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16) co-located with the 12th International Conference on Semantic Systems (SEMANTiCS 2016), Leipzig, Germany, 12–15 September 2016. CEUR Workshop Proceedings, vol. 1695. CEUR-WS.org (2016)

    Google Scholar 

  10. ERDdS, P., R &wi, A.: On random graphs I. Publ. math. debrecen 6(290-297), 18 (1959)

    Google Scholar 

  11. Feng, Z., et al.: A schema-driven synthetic knowledge graph generation approach with extended graph differential dependencies (gdd\(^{\text{ x }}\)s). IEEE Access 9, 5609–5639 (2021). https://doi.org/10.1109/ACCESS.2020.3048186

    Article  Google Scholar 

  12. Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: Hermit: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014). https://doi.org/10.1007/s10817-014-9305-1

    Article  Google Scholar 

  13. Goyal, N., Jain, H.V., Ranu, S.: GraphGen: a scalable approach to domain-agnostic labeled graph generation. In: WWW 2020: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020, pp. 1253–1263. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3380201

  14. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum Comput Stud. 43(5–6), 907–928 (1995). https://doi.org/10.1006/ijhc.1995.1081

    Article  Google Scholar 

  15. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005). https://doi.org/10.1016/j.websem.2005.06.005

    Article  Google Scholar 

  16. Hogan, A., et al.: Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers (2021). https://doi.org/10.2200/S01125ED1V01Y202109DSK022

  17. Hubert, N., Monnin, P., Brun, A., Monticolo, D.: Enhancing knowledge graph embedding models with semantic-driven loss functions. CoRR abs/2303.00286 (2023). https://doi.org/10.48550/arXiv.2303.00286

  18. Hubert, N., Monnin, P., Brun, A., Monticolo, D.: Sem@k: is my knowledge graph embedding model semantic-aware? CoRR abs/2301.05601 (2023). https://doi.org/10.48550/arXiv.2301.05601

  19. Hubert, N., Paulheim, H., Monnin, P., Brun, A., Monticolo, D.: Schema first! learn versatile knowledge graph embeddings by capturing semantics with machine. CoRR abs/2306.03659 (2023). https://doi.org/10.48550/arXiv.2306.03659

  20. Jain, N., Tran, T.K., Gad-Elrab, M.H., Stepanova, D.: Improving knowledge graph embeddings with ontological reasoning. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 410–426. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_24

    Chapter  Google Scholar 

  21. Jin, L., Yao, Z., Chen, M., Chen, H., Zhang, W.: A comprehensive study on knowledge graph embedding over relational patterns based on rule learning (2023)

    Google Scholar 

  22. Liu, S., Grau, B.C., Horrocks, I., Kostylev, E.V.: Revisiting inferential benchmarks for knowledge graph completion. CoRR abs/2306.04814 (2023). https://doi.org/10.48550/arXiv.2306.04814

  23. Melo, A., Paulheim, H.: Synthesizing knowledge graphs for link and type prediction benchmarking. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 136–151. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_9

    Chapter  Google Scholar 

  24. Palowitch, J., Tsitsulin, A., Mayer, B., Perozzi, B.: GraphWorld: fake graphs bring real insights for GNNs. In: KDD 2022: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022, pp. 3691–3701. ACM (2022). https://doi.org/10.1145/3534678.3539203

  25. Park, H., Kim, M.: Trilliong: a trillion-scale synthetic graph generator using a recursive vector model. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, 14–19 May 2017, pp. 913–928. ACM (2017). https://doi.org/10.1145/3035918.3064014

  26. Portisch, J., Paulheim, H.: The DLCC node classification benchmark for analyzing knowledge graph embeddings. In: Sattler, U., et al. (eds.) ISWC 2022. LNCS, vol. 13489, pp. 592–609. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19433-7_34

    Chapter  Google Scholar 

  27. Rossi, A., Firmani, D., Merialdo, P., et al.: Knowledge graph embeddings or bias graph embeddings? A study of bias in link prediction models. In: CEUR Workshop Proceedings, vol. 3034. CEUR-WS (2021)

    Google Scholar 

  28. Rossi, A., Matinata, A.: Knowledge graph embeddings: are relation-learning models learning relations? In: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 30 March 2020. CEUR Workshop Proceedings, vol. 2578. CEUR-WS.org (2020)

    Google Scholar 

  29. Samanta, B., et al.: NEVAE: a deep generative model for molecular graphs. J. Mach. Learn. Res. 21, 114:1–114:33 (2020)

    Google Scholar 

  30. Simonovsky, M., Komodakis, N.: GraphVAE: towards generation of small graphs using variational autoencoders. In: Kurkova, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 412–422. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_41

    Chapter  Google Scholar 

  31. Wang, H., et al.: GraphGAN: graph representation learning with generative adversarial nets. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 2508–2515. AAAI Press (2018)

    Google Scholar 

  32. You, J., Ying, R., Ren, X., Hamilton, W.L., Leskovec, J.: GraphRNN: generating realistic graphs with deep auto-regressive models. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 5694–5703. PMLR (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Hubert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hubert, N., Monnin, P., d’Aquin, M., Monticolo, D., Brun, A. (2024). PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips. In: Meroño Peñuela, A., et al. The Semantic Web. ESWC 2024. Lecture Notes in Computer Science, vol 14665. Springer, Cham. https://doi.org/10.1007/978-3-031-60635-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60635-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60634-2

  • Online ISBN: 978-3-031-60635-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics