Advertisement

Scientometrics

, Volume 113, Issue 3, pp 1551–1571 | Cite as

The coverage of Microsoft Academic: analyzing the publication output of a university

  • Sven E. HugEmail author
  • Martin P. Brändle
Article

Abstract

This is the first detailed study on the coverage of Microsoft Academic (MA). Based on the complete and verified publication list of a university, the coverage of MA was assessed and compared with two benchmark databases, Scopus and Web of Science (WoS), on the level of individual publications. Citation counts were analyzed, and issues related to data retrieval and data quality were examined. A Perl script was written to retrieve metadata from MA based on publication titles. The script is freely available on GitHub. We find that MA covers journal articles, working papers, and conference items to a substantial extent and indexes more document types than the benchmark databases (e.g., working papers, dissertations). MA clearly surpasses Scopus and WoS in covering book-related document types and conference items but falls slightly behind Scopus in journal articles. The coverage of MA is favorable for evaluative bibliometrics in most research fields, including economics/business, computer/information sciences, and mathematics. However, MA shows biases similar to Scopus and WoS with regard to the coverage of the humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. We find that the publication year is correct for 89.5% of all publications and the number of authors is correct for 95.1% of the journal articles. Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA metadata are still lacking.

Keywords

Coverage Research fields Publication language Microsoft Academic Scopus Web of Science EPrints Citation analysis 

Notes

Acknowledgments

The authors thank the development team of Microsoft Academic for their support, the ZORA editorial team for their advice, Robin Haunschild for comments, Mirjam Aeschbach for proofreading, and the reviewers for their remarks.

References

  1. Bertin, M. (2008). Categorizations and annotations of citation in research evaluation. In D.C. Wilson, & H.C. Lane (Eds.), Proceedings of the 21st international Florida artificial intelligence research society conference (pp. 456–461). Menlo Park: AAAI Press.Google Scholar
  2. Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: an approach using n-grams. Scientometrics, 109(3), 1417–1434. doi: 10.1007/s11192-016-2134-8.CrossRefGoogle Scholar
  3. Bosman, J., van Mourik, I., Rasch, M., Sieverts, E., & Verhoeff, H. (2006). Scopus reviewed and compared: The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar. Netherlands: Utrecht University Library.Google Scholar
  4. Clarivate. (2017). Web of science core collection. Retrieved from http://wokinfo.com/products_tools/multidisciplinary/webofscience/.
  5. Currano, J. L., & Roth, D. L. (Eds.). (2014). Chemical information for chemists. A primer. Cambridge: RSC Publishing.Google Scholar
  6. De Domenico, M., Omodei, E., & Arenas, A. (2016). Quantifying the diaspora of knowledge in the last century. Applied Network Science, 1(15), 1–13. doi: 10.1007/s41109-016-0017-9.Google Scholar
  7. Effendy, S., & Yap, R. H. (2016). Investigations on rating computer sciences conferences: An experiment with the Microsoft Academic Graph dataset. In J. Bourdeau, J. A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 425–430). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/2872518.2890525.
  8. Effendy, S., & Yap, R. H. (2017). Analysing trends in computer science research: A preliminary study using the Microsoft Academic Graph. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th international conference companion on world wide web (pp. 1245–1250). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/3041021.3053064.
  9. Elsevier. (2017). Scopus content coverage guide January 2016. Retrieved from https://www.elsevier.com/__data/assets/pdf_file/0007/69451/scopus_content_coverage_guide.pdf.
  10. Fagan, J. C. (2017). An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Information Technology and Libraries, 36(2), 7–47. doi: 10.6017/ital.v36i2.9718.
  11. Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurian, J. C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and Scopus. Journal of Informetrics, 10(1), 98–109. doi: 10.1016/j.joi.2015.11.008.CrossRefGoogle Scholar
  12. Gumpenberger, C., Sorz, J., Wieland, M., & Gorraiz, J. (2016). Humanities and social sciences in the bibliometric spotlight—Research output analysis at the University of Vienna and considerations for increasing visibility. Research Evaluation, 25(3), 271–278. doi: 10.1093/reseval/rvw013.CrossRefGoogle Scholar
  13. Harzing, A.-W. (2016). Microsoft Academic (Search): A Phoenix arisen from the ashes? Scientometrics, 108(3), 1637–1647. doi: 10.1007/s11192-016-2026-y.CrossRefGoogle Scholar
  14. Harzing, A.-W., & Alakangas, S. (2017a). Microsoft Academic: Is the phoenix getting wings? Scientometrics, 110(1), 371–383. doi: 10.1007/s11192-016-2185-x.CrossRefGoogle Scholar
  15. Harzing, A.-W., & Alakangas, S. (2017b). Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics, 112(3), 1887–1894. doi: 10.1007/s11192-017-2454-3.CrossRefGoogle Scholar
  16. Herrmannova, D., & Knoth, P. (2016a). An analysis of the Microsoft Academic Graph. D-Lib Magazine. doi: 10.1045/september2016-herrmannova.Google Scholar
  17. Herrmannova, D., & Knoth, P. (2016b). Semantometrics: Towards fulltext-based research evaluation. In N.R. Adam, B. Cassel, & Y. Yesha (Eds.), Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries (pp. 235–236). New York: ACM. doi:  10.1145/2910896.2925448.
  18. Hug, S. E., & Brändle, M. P. (2017). Microsoft Academic is on the verge of becoming a bibliometric superpower. LSE Impact Blog. Retrieved from http://blogs.lse.ac.uk/impactofsocialsciences/2017/06/19/microsoft-academic-is-on-the-verge-of-becoming-a-bibliometric-superpower/.
  19. Hug, S. E., Ochsner, M., & Brändle, M. P. (2017). Citation analysis with Microsoft Academic. Scientometrics, 111(1), 371–378. doi: 10.1007/s11192-017-2247-8.CrossRefGoogle Scholar
  20. Larsen, P. O., & von Ins, M. (2010). The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 84(3), 575–603. doi: 10.1007/s11192-010-0202-z.CrossRefGoogle Scholar
  21. Luo, D., Gong, C., Hu, R., Duan, L., & Ma, S. (2016). Ensemble enabled weighted PageRank. Retrieved from https://arxiv.org/abs/1604.05462v1.
  22. Main Library of the University of Zurich. (2017). Regulations. Retrieved from http://www.oai.uzh.ch/en/working-with-zora/regulations.
  23. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  24. Mas-Bleda, A., & Thelwall, M. (2016). Can alternative indicators overcome language biases in citation counts? A comparison of Spanish and UK research. Scientometrics, 109(3), 2007–2030. doi: 10.1007/s11192-016-2118-8.CrossRefGoogle Scholar
  25. Medo, M., & Cimini, G. (2016). Model-based evaluation of scientific impact indicators. Physical Review E, 94(3), 032312. doi: 10.1103/PhysRevE.94.032312.CrossRefGoogle Scholar
  26. Microsoft. (2017a). Microsoft Academic Graph. Retrieved from https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
  27. Microsoft. (2017b). Microsoft Academic. Retrieved from https://www.microsoft.com/en-us/research/project/academic/.
  28. Microsoft. (2017c). Microsoft Academic. Frequently Asked Questions. Retrieved from https://academic.microsoft.com/#/faq.
  29. Microsoft Academic [@MSFTAcademic]. (2017). Some facts about the current size of our data. Stop & meet us at #kdd2017 @MLatMSFT [Tweet]. Retrieved from https://twitter.com/MSFTAcademic/status/897494672200921088.
  30. Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.Google Scholar
  31. Moed, H. F., Bar-Ilan, J., & Halevi, G. (2016). A new methodology for comparing Google Scholar and Scopus. Journal of Informetrics, 10(2), 533–551. doi: 10.1016/j.joi.2016.04.017.CrossRefGoogle Scholar
  32. Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics, 106(1), 213–228. doi: 10.1007/s11192-015-1765-5.CrossRefGoogle Scholar
  33. OECD. (2007). Revised field of science and technology (FOS) classification in the Frascati manual. Paris, France: Working Party of National Experts on Science and Technology Indicators, Organisation for Economic Co-operation and Development (OECD).Google Scholar
  34. Orduna-Malea, E., Ayllón, J. M., Martín-Martín, A., & López-Cózar, E. D. (2015). Methods for estimating the size of Google Scholar. Scientometrics, 104(3), 931–949. doi: 10.1007/s11192-015-1614-6.CrossRefGoogle Scholar
  35. Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Cambridge: Chandos Publishing.Google Scholar
  36. Portenoy, J., Hullman, J., & West, J. D. (2016). Leveraging citation networks to visualize scholarly influence over time. Retrieved from https://arxiv.org/abs/1611.07135v2.
  37. Portenoy, J., & West, J. D. (2017). Visualizing scholarly publications and citations to enhance author profiles. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th International Conference Companion on World Wide Web (pp. 1279-1282). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/3041021.3053058.
  38. Prins, A. A. M., Costas, R., van Leeuwen, T. N., & Wouters, P. F. (2016). Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data. Research Evaluation, 25(3), 264–270. doi: 10.1093/reseval/rvv049.CrossRefGoogle Scholar
  39. Ribas, S., Ueda, A., Santos, R. L., Ribeiro-Neto, B., & Ziviani, N. (2016). Simplified relative citation ratio for static paper ranking. Retrieved from https://arxiv.org/abs/1603.01336v1.
  40. Sandulescu, V., & Chiru, M. (2016). Predicting the future relevance of research institutions—The winning solution of the KDD Cup 2016. Retrieved from https://arxiv.org/abs/1609.02728v1.
  41. Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-j. P., & Wang, K. (2015). An overview of Microsoft Academic Service (MAS) and applications. In A. Gangemi, S. Leonardi, & A. Panconesi (Eds.), Proceedings of the 24th international conference on world wide web (pp. 243–246). New York: ACM. doi:  10.1145/2740908.2742839.
  42. Tan, Z., Liu, C., Mao, Y., Guo, Y., Shen, J., & Wang, X. (2016). AceMap: A novel approach towards displaying relationship among academic literatures. In J. Bourdeau, J.A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 437–442). Geneva: International World Wide Web Conferences Steering Committee. doi:  10.1145/2872518.2890514.
  43. Vaccario, G., Medo, M., Wider, N., & Mariani, M. S. (2017). Quantifying and suppressing ranking bias in a large citation network. Journal of Informetrics, 11(3), 766–782. doi: 10.1016/j.joi.2017.05.014.CrossRefGoogle Scholar
  44. Wade, A., Kuasan, W., Yizhou, S., & Gulli, A. (2016). WSDM cup 2016: Entity ranking challenge. In P. N. Bennet, V. Josifovski, J. Neville, & F. Radlinski (Eds.), Proceedings of the ninth ACM international conference on web search and data mining (pp. 593–594). New York: ACM.Google Scholar
  45. Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391. doi: 10.1016/j.joi.2016.02.007.CrossRefMathSciNetGoogle Scholar
  46. Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). Retrieved from https://arxiv.org/abs/1606.08534v1.
  47. Wilson, J., Mohan, R., Arif, M., Chaudhury, S., & Lall, B. (2016). Ranking academic institutions on potential paper acceptance in upcoming conferences. Retrieved from https://arxiv.org/abs/1610.02828v1.
  48. Xiao, S., Yan, J., Li, C., Jin, B., Wang, X., Zha, H., & Yang, X. (2016). On modeling and predicting individual paper citation count over time. In S. Kambhampati (Ed.), Proceedings of the 25th international joint conference on artificial intelligence (pp. 2676–2682). New York: AAAI Press.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  1. 1.Social Psychology and Research on Higher EducationETH Zurich, D-GESSZurichSwitzerland
  2. 2.Evaluation OfficeUniversity of ZurichZurichSwitzerland
  3. 3.Zentrale InformatikUniversity of ZurichZurichSwitzerland
  4. 4.Main LibraryUniversity of ZurichZurichSwitzerland

Personalised recommendations