Skip to main content

AI-KG: An Automatically Generated Knowledge Graph of Artificial Intelligence

Part of the Lecture Notes in Computer Science book series (LNISA,volume 12507)

Abstract

Scientific knowledge has been traditionally disseminated and preserved through research articles published in journals, conference proceedings, and online archives. However, this article-centric paradigm has been often criticized for not allowing to automatically process, categorize, and reason on this knowledge. An alternative vision is to generate a semantically rich and interlinked description of the content of research publications. In this paper, we present the Artificial Intelligence Knowledge Graph (AI-KG), a large-scale automatically generated knowledge graph that describes 820K research entities. AI-KG includes about 14M RDF triples and 1.2M reified statements extracted from 333K research publications in the field of AI, and describes 5 types of entities (tasks, methods, metrics, materials, others) linked by 27 relations. AI-KG has been designed to support a variety of intelligent services for analyzing and making sense of research dynamics, supporting researchers in their daily job, and helping to inform decision-making in funding bodies and research policymakers. AI-KG has been generated by applying an automatic pipeline that extracts entities and relationships using three tools: DyGIE++, Stanford CoreNLP, and the CSO Classifier. It then integrates and filters the resulting triples using a combination of deep learning and semantic technologies in order to produce a high-quality knowledge graph. This pipeline was evaluated on a manually crafted gold standard, yielding competitive results. AI-KG is available under CC BY 4.0 and can be downloaded as a dump or queried via a SPARQL endpoint.

Keywords

  • Artificial Intelligence
  • Scholarly data
  • Knowledge graph
  • Information Extraction
  • Natural Language Processing

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-62466-8_9
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-62466-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    http://ontoware.org/swrc.

  2. 2.

    http://bibliontology.com.

  3. 3.

    http://purl.org/spar/bido.

  4. 4.

    http://cso.kmi.open.ac.uk.

  5. 5.

    https://opencitations.net/.

  6. 6.

    https://www.openacademic.ai/oag/.

  7. 7.

    https://www.orkg.org/orkg/.

  8. 8.

    http://w3id.org/aida/.

  9. 9.

    http://w3id.org/aikg.

  10. 10.

    http://wit.istc.cnr.it/stlab-tools/fred/.

  11. 11.

    https://scienceie.github.io/.

  12. 12.

    http://w3id.org/aikg/aikg/ontology.

  13. 13.

    https://www.w3.org/2004/02/skos/.

  14. 14.

    https://www.w3.org/TR/prov-o/.

  15. 15.

    https://www.w3.org/OWL/.

  16. 16.

    https://github.com/angelosalatino/cso-classifier.

  17. 17.

    https://nlp.stanford.edu/software/tagger.shtml.

  18. 18.

    We thank NVIDIA Corp. for the donation of 1 Titan Xp GPU used in this research.

  19. 19.

    https://www.nltk.org/howto/wordnet.html.

  20. 20.

    https://www.w3.org/TR/rdf-mt/#Reif.

  21. 21.

    https://cso.kmi.open.ac.uk/participate.

References

  1. Al-Zaidy, R.A., Giles, C.L.: Extracting semantic relations for scholarly knowledge base construction. In: IEEE 12th ICSC, pp. 56–63 (2018)

    Google Scholar 

  2. Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the ACL and the 7th IJCNLP, vol. 1, pp. 344–354 (2015)

    Google Scholar 

  3. Angioni, S., Salatino, A.A., Osborne, F., Recupero, D.R., Motta, E.: Integrating knowledge graphs for analysing academia and industry dynamics. In: Bellatreche, L., et al. (eds.) TPDL/ADBIS -2020. CCIS, vol. 1260, pp. 219–225. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55814-7_18

    CrossRef  Google Scholar 

  4. Auer, S., Kovtun, V., Prinz, M., et al.: Towards a knowledge graph for science. In: 8th International Conference on Web Intelligence, Mining and Semantics (2018)

    Google Scholar 

  5. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)

    CrossRef  Google Scholar 

  6. Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)

    CrossRef  Google Scholar 

  7. Buscaldi, D., Dessì, D., Motta, E., Osborne, F., Reforgiato Recupero, D.: Mining scholarly publications for scientific knowledge graph construction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11762, pp. 8–12. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32327-1_2

    CrossRef  Google Scholar 

  8. Curran, J.R., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with C&C and boxer. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 33–36 (2007)

    Google Scholar 

  9. Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E.: Generating knowledge graphs by employing natural language processing and machine learning techniques within the scholarly domain. Future Gener. Comput. Syst. (2020)

    Google Scholar 

  10. Gábor, K., Buscaldi, D., Schumann, A.K., et al.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)

    Google Scholar 

  11. Gangemi, A., Presutti, V., Reforgiato Recupero, D., et al.: Semantic web machine reading with FRED. Semant. Web 8(6), 873–893 (2017)

    CrossRef  Google Scholar 

  12. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)

    Google Scholar 

  13. Jaradeh, M.Y., Oelen, A., Farfar, K.E., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019)

    Google Scholar 

  14. Kuhn, T., Chichester, C., Krauthammer, M., Queralt-Rosinach, N., Verborgh, R., et al.: Decentralized provenance-aware publishing with nanopublications. PeerJ Comput. Sci. 2, e78 (2016)

    CrossRef  Google Scholar 

  15. Labropoulou, P., Galanis, D., Lempesis, A., et al.: OpenMinTeD: a platform facilitating text mining of scholarly content. In: 11th International Conference on Language Resources and Evaluation (LREC 2018), Paris, France (2018)

    Google Scholar 

  16. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the EMNLP 2018 Conference, pp. 3219–3232 (2018)

    Google Scholar 

  17. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., et al.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  18. Martinez-Rodriguez, J.L., Lopez-Arevalo, I., Rios-Alvarado, A.B.: OpenIE-based approach for knowledge graph construction from text. Expert Syst. Appl. 113, 339–355 (2018)

    CrossRef  Google Scholar 

  19. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18

    CrossRef  Google Scholar 

  20. Peng, R.: The reproducibility crisis in science: a statistical counterattack. Significance 12(3), 30–32 (2015)

    CrossRef  Google Scholar 

  21. Peroni, S., Shotton, D.: The SPAR ontologies. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 119–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_8

    CrossRef  Google Scholar 

  22. Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: automatic hypernym detection from large text corpora. In: Proceedings of the 56th Annual Meeting of the ACL, pp. 358–363 (2018)

    Google Scholar 

  23. Salatino, A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO classifier: ontology-driven detection of research topics in scholarly articles (2019)

    Google Scholar 

  24. Salatino, A.A., Osborne, F., Birukou, A., Motta, E.: Improving editorial workflow and metadata quality at springer nature. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 507–525. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_31

    CrossRef  Google Scholar 

  25. Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 187–205. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_12

    CrossRef  Google Scholar 

  26. Schneider, J., Ciccarese, P., Clark, T., Boyce, R.D.: Using the micropublications ontology and the open annotation data model to represent evidence within a drug-drug interaction knowledge base (2014)

    Google Scholar 

  27. Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learn. Publish. 22(2), 85–94 (2009)

    CrossRef  Google Scholar 

  28. Snow, R., Jurafsky, D., Ng, A.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1297–1304 (2005)

    Google Scholar 

  29. Tennant, J.P., Crane, H., Crick, T., Davila, J., et al.: Ten hot topics around scholarly publishing. Publications 7(2), 34 (2019)

    CrossRef  Google Scholar 

  30. Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: Proceedings of the 2019 Joint Conference EMNLP-IJCNLP, pp. 5788–5793 (2019)

    Google Scholar 

  31. Wolstencroft, K., Haines, R., et al.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danilo Dessì .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H. (2020). AI-KG: An Automatically Generated Knowledge Graph of Artificial Intelligence. In: , et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62466-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62465-1

  • Online ISBN: 978-3-030-62466-8

  • eBook Packages: Computer ScienceComputer Science (R0)