Skip to main content

The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data

Part of the Lecture Notes in Computer Science book series (LNISA,volume 11779)

Abstract

In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.

Keywords

  • Scholarly data
  • Knowledge graph
  • Digital libraries

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-30796-7_8
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-30796-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Notes

  1. 1.

    The values are based on SPARQL queries executed against our data set presented in Sect. 3.

  2. 2.

    See https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

  3. 3.

    Both the initial MAG data set and the MAKG provided by us are licensed under the Open Data Commons Attribution License (ODC-By; https://opendatacommons.org/licenses/by/1-0/index.html; last access: April 9, 2019).

  4. 4.

    The source code is available online at https://github.com/michaelfaerber/MAG2RDF.

  5. 5.

    See https://www.grid.ac/.

  6. 6.

    The MAKG is also available at the persistent URI https://w3id.org/makg/.

  7. 7.

    See http://doi.org/10.5281/zenodo.2159723.

  8. 8.

    See the S3 bucket arn:aws:s3:::ma-kg.

  9. 9.

    See, e.g., curl -H"Accept:text/n3" http://ma-graph.org/entity/2826592117 and curl -H "Accept:text/ttl" http://ma-graph.org/entity/2826592117.

  10. 10.

    See https://www.springernature.com/de/researchers/scigraph.

  11. 11.

    See http://wikicite.org/.

  12. 12.

    In our paper, the term “citations” refers to in-text citations while “references” refers to links on the document level.

  13. 13.

    See http://clair.eecs.umich.edu/aan/index.php.

  14. 14.

    See https://www.comp.nus.edu.sg/~sugiyama/Dataset2.html.

  15. 15.

    The source code is online available at https://github.com/michaelfaerber/makg-linking. The mappings are available as nt files with owl:sameAs statements on our website.

  16. 16.

    Note that only the number of citations is listed and not the number of references, because references are modeled in the MAKG via a relation (cito:cites). There are 1,380,196,397 references in the MAKG.

  17. 17.

    See https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning#open-data-license-odc-by.

  18. 18.

    See http://lov.okfn.org/vocommons/voaf.

  19. 19.

    See http://www.w3.org/TR/void/.

  20. 20.

    See http://5stardata.info/.

  21. 21.

    Sinha et al. [1] have obtained 187 citations as of March 29, 2019, according to Google Scholar.

  22. 22.

    See https://doi.org/10.5281/zenodo.2159723 (as of April 10, 2019). Note that the data set is also available at http://ma-graph.org/ and on Amazon S3.

  23. 23.

    See http://ma-graph.org/usage-statistics/ for usage statistics concerning the website and the SPARQL endpoint.

  24. 24.

    See https://www.openacademic.ai/oag/.

References

  1. Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 243–246 (2015)

    Google Scholar 

  2. Peroni, S., Dutton, A., Gray, T., Shotton, D.M.: Setting our bibliographic references free: towards open citation data. J. Doc. 71(2), 253–277 (2015)

    CrossRef  Google Scholar 

  3. Aleman-Meza, B., Hakimpour, F., Arpinar, I.B., Sheth, A.P.: SwetoDblp ontology of computer science publications. J. Web Semant. 5(3), 151–155 (2007)

    CrossRef  Google Scholar 

  4. Wang, R., et al.: AceKG: a large-scale knowledge graph for academic data mining. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 1487–1490 (2018)

    Google Scholar 

  5. Aslam, M.A., Aljohani, N.R.: SPedia: a central hub for the linked open data of scientific publications. Int. J. Semant. Web Inf. Syst. 13(1), 128–146 (2017)

    CrossRef  Google Scholar 

  6. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Conference linked data: the scholarlydata project. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 150–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_16

    CrossRef  Google Scholar 

  7. Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18

    CrossRef  Google Scholar 

  8. Gentile, A.L., Acosta, M., Costabello, L., Nuzzolese, A.G., Presutti, V., Recupero, D.R.: Conference live: accessible and sociable conference semantic data. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 1007–1012 (2015)

    Google Scholar 

  9. Konstantinou, N., Spanos, D., Houssos, N., Mitrou, N.: Exposing scholarly information as Linked Open Data: RDFizing DSpace contents. Electron. Libr. 32(6), 834–851 (2014)

    CrossRef  Google Scholar 

  10. Peroni, S., Shotton, D.: The SPAR ontologies. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 119–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_8

    CrossRef  Google Scholar 

  11. Zhang, L., Rettinger, A.: X-LiSA: cross-lingual semantic annotation. PVLDB 7(13), 1693–1696 (2014)

    Google Scholar 

  12. Färber, M., Thiemann, A., Jatowt, A.: A high-quality gold standard for citation-based tasks. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, pp. 1885–1889 (2018)

    Google Scholar 

  13. Saier, T., Färber, M.: Bibliometric-enhanced arXiv: a data set for paper-based and citation-based tasks. In: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval, BIR 2019, pp. 14–26 (2019)

    Google Scholar 

  14. Herrmannova, D., Knoth, P.: An analysis of the Microsoft academic graph. D-Lib Mag. 22(9/10) (2016)

    Google Scholar 

  15. Janowicz, K., Hitzler, P., Adams, B., Kolas, D., Vardeman, C.: Five stars of linked data vocabulary use. Semant. Web 5(3), 173–176 (2014)

    CrossRef  Google Scholar 

  16. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30

    CrossRef  Google Scholar 

  17. Carrasco, M.H., Luján-Mora, S., Maté, A., Trujillo, J.: Current state of linked data in digital libraries. J. Inf. Sci. 42(2), 117–127 (2016)

    CrossRef  Google Scholar 

  18. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25

    CrossRef  Google Scholar 

  19. Färber, M., Nishioka, C., Jatowt, A.: ScholarSight: visualizing temporal trends of scientific concepts. In: Proceedings of the 19th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2019, pp. 436–437 (2019)

    Google Scholar 

  20. Färber, M., Sampath, A., Jatowt, A.: PaperHunter: a system for exploring papers and citation contexts. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 246–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_33

    CrossRef  Google Scholar 

  21. Hug, S.E., Ochsner, M., Brändle, M.P.: Citation analysis with Microsoft academic. Scientometrics 111(1), 371–378 (2017)

    CrossRef  Google Scholar 

  22. Mohapatra, D., Maiti, A., Bhatia, S., Chakraborty, T.: Go wide, go deep: quantifying the impact of scientific papers through influence dispersion trees. In: Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, pp. 305–314 (2019)

    Google Scholar 

  23. Fire, M., Guestrin, C.: Over-optimization of academic publishing metrics: observing Goodhart’s law in action. CoRR abs/1809.07841 (2018)

    Google Scholar 

  24. Hoffman, M.R., Ibáñez, L.-D., Fryer, H., Simperl, E.: Smart papers: dynamic publications on the blockchain. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 304–318. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_20

    CrossRef  Google Scholar 

  25. Jaradeh, M.Y., Auer, S., Prinz, M., Kovtun, V., Kismihók, G., Stocker, M.: Open research knowledge graph: towards machine actionability in scholarly communication. CoRR abs/1901.10816 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Färber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Färber, M. (2019). The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In: , et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30796-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30795-0

  • Online ISBN: 978-3-030-30796-7

  • eBook Packages: Computer ScienceComputer Science (R0)