Abstract
In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.
Keywords
- Scholarly data
- Knowledge graph
- Digital libraries
This is a preview of subscription content, access via your institution.
Buying options




Notes
- 1.
The values are based on SPARQL queries executed against our data set presented in Sect. 3.
- 2.
- 3.
Both the initial MAG data set and the MAKG provided by us are licensed under the Open Data Commons Attribution License (ODC-By; https://opendatacommons.org/licenses/by/1-0/index.html; last access: April 9, 2019).
- 4.
The source code is available online at https://github.com/michaelfaerber/MAG2RDF.
- 5.
See https://www.grid.ac/.
- 6.
The MAKG is also available at the persistent URI https://w3id.org/makg/.
- 7.
- 8.
See the S3 bucket arn:aws:s3:::ma-kg.
- 9.
See, e.g., curl -H"Accept:text/n3" http://ma-graph.org/entity/2826592117 and curl -H "Accept:text/ttl" http://ma-graph.org/entity/2826592117.
- 10.
- 11.
See http://wikicite.org/.
- 12.
In our paper, the term “citations” refers to in-text citations while “references” refers to links on the document level.
- 13.
- 14.
- 15.
The source code is online available at https://github.com/michaelfaerber/makg-linking. The mappings are available as nt files with owl:sameAs statements on our website.
- 16.
Note that only the number of citations is listed and not the number of references, because references are modeled in the MAKG via a relation (cito:cites). There are 1,380,196,397 references in the MAKG.
- 17.
- 18.
- 19.
- 20.
- 21.
Sinha et al. [1] have obtained 187 citations as of March 29, 2019, according to Google Scholar.
- 22.
See https://doi.org/10.5281/zenodo.2159723 (as of April 10, 2019). Note that the data set is also available at http://ma-graph.org/ and on Amazon S3.
- 23.
See http://ma-graph.org/usage-statistics/ for usage statistics concerning the website and the SPARQL endpoint.
- 24.
References
Sinha, A., et al.: An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 243–246 (2015)
Peroni, S., Dutton, A., Gray, T., Shotton, D.M.: Setting our bibliographic references free: towards open citation data. J. Doc. 71(2), 253–277 (2015)
Aleman-Meza, B., Hakimpour, F., Arpinar, I.B., Sheth, A.P.: SwetoDblp ontology of computer science publications. J. Web Semant. 5(3), 151–155 (2007)
Wang, R., et al.: AceKG: a large-scale knowledge graph for academic data mining. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 1487–1490 (2018)
Aslam, M.A., Aljohani, N.R.: SPedia: a central hub for the linked open data of scientific publications. Int. J. Semant. Web Inf. Syst. 13(1), 128–146 (2017)
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Conference linked data: the scholarlydata project. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 150–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_16
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A.: Semantic web conference ontology - a refactoring solution. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 84–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_18
Gentile, A.L., Acosta, M., Costabello, L., Nuzzolese, A.G., Presutti, V., Recupero, D.R.: Conference live: accessible and sociable conference semantic data. In: Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, pp. 1007–1012 (2015)
Konstantinou, N., Spanos, D., Houssos, N., Mitrou, N.: Exposing scholarly information as Linked Open Data: RDFizing DSpace contents. Electron. Libr. 32(6), 834–851 (2014)
Peroni, S., Shotton, D.: The SPAR ontologies. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 119–136. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_8
Zhang, L., Rettinger, A.: X-LiSA: cross-lingual semantic annotation. PVLDB 7(13), 1693–1696 (2014)
Färber, M., Thiemann, A., Jatowt, A.: A high-quality gold standard for citation-based tasks. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, pp. 1885–1889 (2018)
Saier, T., Färber, M.: Bibliometric-enhanced arXiv: a data set for paper-based and citation-based tasks. In: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval, BIR 2019, pp. 14–26 (2019)
Herrmannova, D., Knoth, P.: An analysis of the Microsoft academic graph. D-Lib Mag. 22(9/10) (2016)
Janowicz, K., Hitzler, P., Adams, B., Kolas, D., Vardeman, C.: Five stars of linked data vocabulary use. Semant. Web 5(3), 173–176 (2014)
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Carrasco, M.H., Luján-Mora, S., Maté, A., Trujillo, J.: Current state of linked data in digital libraries. J. Inf. Sci. 42(2), 117–127 (2016)
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_25
Färber, M., Nishioka, C., Jatowt, A.: ScholarSight: visualizing temporal trends of scientific concepts. In: Proceedings of the 19th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2019, pp. 436–437 (2019)
Färber, M., Sampath, A., Jatowt, A.: PaperHunter: a system for exploring papers and citation contexts. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 246–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_33
Hug, S.E., Ochsner, M., Brändle, M.P.: Citation analysis with Microsoft academic. Scientometrics 111(1), 371–378 (2017)
Mohapatra, D., Maiti, A., Bhatia, S., Chakraborty, T.: Go wide, go deep: quantifying the impact of scientific papers through influence dispersion trees. In: Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2019, pp. 305–314 (2019)
Fire, M., Guestrin, C.: Over-optimization of academic publishing metrics: observing Goodhart’s law in action. CoRR abs/1809.07841 (2018)
Hoffman, M.R., Ibáñez, L.-D., Fryer, H., Simperl, E.: Smart papers: dynamic publications on the blockchain. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 304–318. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_20
Jaradeh, M.Y., Auer, S., Prinz, M., Kovtun, V., Kismihók, G., Stocker, M.: Open research knowledge graph: towards machine actionability in scholarly communication. CoRR abs/1901.10816 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Färber, M. (2019). The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In: , et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11779. Springer, Cham. https://doi.org/10.1007/978-3-030-30796-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-30796-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30795-0
Online ISBN: 978-3-030-30796-7
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://swsa.semanticweb.org/