Skip to main content
Log in

The coverage of Microsoft Academic: analyzing the publication output of a university

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

This is the first detailed study on the coverage of Microsoft Academic (MA). Based on the complete and verified publication list of a university, the coverage of MA was assessed and compared with two benchmark databases, Scopus and Web of Science (WoS), on the level of individual publications. Citation counts were analyzed, and issues related to data retrieval and data quality were examined. A Perl script was written to retrieve metadata from MA based on publication titles. The script is freely available on GitHub. We find that MA covers journal articles, working papers, and conference items to a substantial extent and indexes more document types than the benchmark databases (e.g., working papers, dissertations). MA clearly surpasses Scopus and WoS in covering book-related document types and conference items but falls slightly behind Scopus in journal articles. The coverage of MA is favorable for evaluative bibliometrics in most research fields, including economics/business, computer/information sciences, and mathematics. However, MA shows biases similar to Scopus and WoS with regard to the coverage of the humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. We find that the publication year is correct for 89.5% of all publications and the number of authors is correct for 95.1% of the journal articles. Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA metadata are still lacking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://academic.microsoft.com.

  2. https://www.aka.ms/AcademicAPI.

  3. https://www.zora.uzh.ch.

  4. http://www.eprints.org/.

  5. https://github.com/QUTlib/citation-import.

  6. http://api.elsevier.com/content/search/scopus.

  7. http://ipscience-help.thomsonreuters.com/LAMRService/WebServicesOverviewGroup/overview.html.

  8. For an overview of the AK API see https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/home.

  9. https://github.com/eprintsug/microsoft-academic.

References

  • Bertin, M. (2008). Categorizations and annotations of citation in research evaluation. In D.C. Wilson, & H.C. Lane (Eds.), Proceedings of the 21st international Florida artificial intelligence research society conference (pp. 456–461). Menlo Park: AAAI Press.

  • Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics, 109(3), 1417–1434. doi:10.1007/s11192-016-2134-8.

    Article  Google Scholar 

  • Bosman, J., van Mourik, I., Rasch, M., Sieverts, E., & Verhoeff, H. (2006). Scopus reviewed and compared: The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar. Netherlands: Utrecht University Library.

    Google Scholar 

  • Clarivate. (2017). Web of science core collection. Retrieved from http://wokinfo.com/products_tools/multidisciplinary/webofscience/.

  • Currano, J. L., & Roth, D. L. (Eds.). (2014). Chemical information for chemists. A primer. Cambridge: RSC Publishing.

    Google Scholar 

  • De Domenico, M., Omodei, E., & Arenas, A. (2016). Quantifying the diaspora of knowledge in the last century. Applied Network Science, 1(15), 1–13. doi:10.1007/s41109-016-0017-9.

    Google Scholar 

  • Effendy, S., & Yap, R. H. (2016). Investigations on rating computer sciences conferences: An experiment with the Microsoft Academic Graph dataset. In J. Bourdeau, J. A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 425–430). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/2872518.2890525.

  • Effendy, S., & Yap, R. H. (2017). Analysing trends in computer science research: A preliminary study using the Microsoft Academic Graph. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th international conference companion on world wide web (pp. 1245–1250). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/3041021.3053064.

  • Elsevier. (2017). Scopus content coverage guide January 2016. Retrieved from https://www.elsevier.com/__data/assets/pdf_file/0007/69451/scopus_content_coverage_guide.pdf.

  • Fagan, J. C. (2017). An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Information Technology and Libraries, 36(2), 7–47. doi:10.6017/ital.v36i2.9718.

  • Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurian, J. C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and Scopus. Journal of Informetrics, 10(1), 98–109. doi:10.1016/j.joi.2015.11.008.

    Article  Google Scholar 

  • Gumpenberger, C., Sorz, J., Wieland, M., & Gorraiz, J. (2016). Humanities and social sciences in the bibliometric spotlight—Research output analysis at the University of Vienna and considerations for increasing visibility. Research Evaluation, 25(3), 271–278. doi:10.1093/reseval/rvw013.

    Article  Google Scholar 

  • Harzing, A.-W. (2016). Microsoft Academic (Search): A Phoenix arisen from the ashes? Scientometrics, 108(3), 1637–1647. doi:10.1007/s11192-016-2026-y.

    Article  Google Scholar 

  • Harzing, A.-W., & Alakangas, S. (2017a). Microsoft Academic: Is the phoenix getting wings? Scientometrics, 110(1), 371–383. doi:10.1007/s11192-016-2185-x.

    Article  Google Scholar 

  • Harzing, A.-W., & Alakangas, S. (2017b). Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics, 112(3), 1887–1894. doi:10.1007/s11192-017-2454-3.

    Article  Google Scholar 

  • Herrmannova, D., & Knoth, P. (2016a). An analysis of the Microsoft Academic Graph. D-Lib Magazine. doi:10.1045/september2016-herrmannova.

    Google Scholar 

  • Herrmannova, D., & Knoth, P. (2016b). Semantometrics: Towards fulltext-based research evaluation. In N.R. Adam, B. Cassel, & Y. Yesha (Eds.), Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries (pp. 235–236). New York: ACM. doi: 10.1145/2910896.2925448.

  • Hug, S. E., & Brändle, M. P. (2017). Microsoft Academic is on the verge of becoming a bibliometric superpower. LSE Impact Blog. Retrieved from http://blogs.lse.ac.uk/impactofsocialsciences/2017/06/19/microsoft-academic-is-on-the-verge-of-becoming-a-bibliometric-superpower/.

  • Hug, S. E., Ochsner, M., & Brändle, M. P. (2017). Citation analysis with Microsoft Academic. Scientometrics, 111(1), 371–378. doi:10.1007/s11192-017-2247-8.

    Article  Google Scholar 

  • Larsen, P. O., & von Ins, M. (2010). The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 84(3), 575–603. doi:10.1007/s11192-010-0202-z.

    Article  Google Scholar 

  • Luo, D., Gong, C., Hu, R., Duan, L., & Ma, S. (2016). Ensemble enabled weighted PageRank. Retrieved from https://arxiv.org/abs/1604.05462v1.

  • Main Library of the University of Zurich. (2017). Regulations. Retrieved from http://www.oai.uzh.ch/en/working-with-zora/regulations.

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Mas-Bleda, A., & Thelwall, M. (2016). Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research. Scientometrics, 109(3), 2007–2030. doi:10.1007/s11192-016-2118-8.

    Article  Google Scholar 

  • Medo, M., & Cimini, G. (2016). Model-based evaluation of scientific impact indicators. Physical Review E, 94(3), 032312. doi:10.1103/PhysRevE.94.032312.

    Article  Google Scholar 

  • Microsoft. (2017a). Microsoft Academic Graph. Retrieved from https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.

  • Microsoft. (2017b). Microsoft Academic. Retrieved from https://www.microsoft.com/en-us/research/project/academic/.

  • Microsoft. (2017c). Microsoft Academic. Frequently Asked Questions. Retrieved from https://academic.microsoft.com/#/faq.

  • Microsoft Academic [@MSFTAcademic]. (2017). Some facts about the current size of our data. Stop & meet us at #kdd2017 @MLatMSFT [Tweet]. Retrieved from https://twitter.com/MSFTAcademic/status/897494672200921088.

  • Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.

    Google Scholar 

  • Moed, H. F., Bar-Ilan, J., & Halevi, G. (2016). A new methodology for comparing Google Scholar and Scopus. Journal of Informetrics, 10(2), 533–551. doi:10.1016/j.joi.2016.04.017.

    Article  Google Scholar 

  • Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics, 106(1), 213–228. doi:10.1007/s11192-015-1765-5.

    Article  Google Scholar 

  • OECD. (2007). Revised field of science and technology (FOS) classification in the Frascati manual. Paris, France: Working Party of National Experts on Science and Technology Indicators, Organisation for Economic Co-operation and Development (OECD).

  • Orduna-Malea, E., Ayllón, J. M., Martín-Martín, A., & López-Cózar, E. D. (2015). Methods for estimating the size of Google Scholar. Scientometrics, 104(3), 931–949. doi:10.1007/s11192-015-1614-6.

    Article  Google Scholar 

  • Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Cambridge: Chandos Publishing.

    Google Scholar 

  • Portenoy, J., Hullman, J., & West, J. D. (2016). Leveraging citation networks to visualize scholarly influence over time. Retrieved from https://arxiv.org/abs/1611.07135v2.

  • Portenoy, J., & West, J. D. (2017). Visualizing scholarly publications and citations to enhance author profiles. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th International Conference Companion on World Wide Web (pp. 1279-1282). Geneva: International World Wide Web Conferences Steering Committee. doi:10.1145/3041021.3053058.

  • Prins, A. A. M., Costas, R., van Leeuwen, T. N., & Wouters, P. F. (2016). Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data. Research Evaluation, 25(3), 264–270. doi:10.1093/reseval/rvv049.

    Article  Google Scholar 

  • Ribas, S., Ueda, A., Santos, R. L., Ribeiro-Neto, B., & Ziviani, N. (2016). Simplified relative citation ratio for static paper ranking. Retrieved from https://arxiv.org/abs/1603.01336v1.

  • Sandulescu, V., & Chiru, M. (2016). Predicting the future relevance of research institutions—The winning solution of the KDD Cup 2016. Retrieved from https://arxiv.org/abs/1609.02728v1.

  • Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-j. P., & Wang, K. (2015). An overview of Microsoft Academic Service (MAS) and applications. In A. Gangemi, S. Leonardi, & A. Panconesi (Eds.), Proceedings of the 24th international conference on world wide web (pp. 243–246). New York: ACM. doi: 10.1145/2740908.2742839.

  • Tan, Z., Liu, C., Mao, Y., Guo, Y., Shen, J., & Wang, X. (2016). AceMap: A novel approach towards displaying relationship among academic literatures. In J. Bourdeau, J.A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 437–442). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/2872518.2890514.

  • Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782. doi:10.1016/j.joi.2017.05.014.

    Article  Google Scholar 

  • Wade, A., Kuasan, W., Yizhou, S., & Gulli, A. (2016). WSDM cup 2016: Entity ranking challenge. In P. N. Bennet, V. Josifovski, J. Neville, & F. Radlinski (Eds.), Proceedings of the ninth ACM international conference on web search and data mining (pp. 593–594). New York: ACM.

  • Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391. doi:10.1016/j.joi.2016.02.007.

    Article  MathSciNet  Google Scholar 

  • Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). Retrieved from https://arxiv.org/abs/1606.08534v1.

  • Wilson, J., Mohan, R., Arif, M., Chaudhury, S., & Lall, B. (2016). Ranking academic institutions on potential paper acceptance in upcoming conferences. Retrieved from https://arxiv.org/abs/1610.02828v1.

  • Xiao, S., Yan, J., Li, C., Jin, B., Wang, X., Zha, H., & Yang, X. (2016). On modeling and predicting individual paper citation count over time. In S. Kambhampati (Ed.), Proceedings of the 25th international joint conference on artificial intelligence (pp. 2676–2682). New York: AAAI Press.

Download references

Acknowledgments

The authors thank the development team of Microsoft Academic for their support, the ZORA editorial team for their advice, Robin Haunschild for comments, Mirjam Aeschbach for proofreading, and the reviewers for their remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sven E. Hug.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hug, S.E., Brändle, M.P. The coverage of Microsoft Academic: analyzing the publication output of a university. Scientometrics 113, 1551–1571 (2017). https://doi.org/10.1007/s11192-017-2535-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2535-3

Keywords

Navigation