The coverage of Microsoft Academic: analyzing the publication output of a university
- 1.4k Downloads
This is the first detailed study on the coverage of Microsoft Academic (MA). Based on the complete and verified publication list of a university, the coverage of MA was assessed and compared with two benchmark databases, Scopus and Web of Science (WoS), on the level of individual publications. Citation counts were analyzed, and issues related to data retrieval and data quality were examined. A Perl script was written to retrieve metadata from MA based on publication titles. The script is freely available on GitHub. We find that MA covers journal articles, working papers, and conference items to a substantial extent and indexes more document types than the benchmark databases (e.g., working papers, dissertations). MA clearly surpasses Scopus and WoS in covering book-related document types and conference items but falls slightly behind Scopus in journal articles. The coverage of MA is favorable for evaluative bibliometrics in most research fields, including economics/business, computer/information sciences, and mathematics. However, MA shows biases similar to Scopus and WoS with regard to the coverage of the humanities, non-English publications, and open-access publications. Rank correlations of citation counts are high between MA and the benchmark databases. We find that the publication year is correct for 89.5% of all publications and the number of authors is correct for 95.1% of the journal articles. Given the fast and ongoing development of MA, we conclude that MA is on the verge of becoming a bibliometric superpower. However, comprehensive studies on the quality of MA metadata are still lacking.
KeywordsCoverage Research fields Publication language Microsoft Academic Scopus Web of Science EPrints Citation analysis
The authors thank the development team of Microsoft Academic for their support, the ZORA editorial team for their advice, Robin Haunschild for comments, Mirjam Aeschbach for proofreading, and the reviewers for their remarks.
- Bertin, M. (2008). Categorizations and annotations of citation in research evaluation. In D.C. Wilson, & H.C. Lane (Eds.), Proceedings of the 21st international Florida artificial intelligence research society conference (pp. 456–461). Menlo Park: AAAI Press.Google Scholar
- Bosman, J., van Mourik, I., Rasch, M., Sieverts, E., & Verhoeff, H. (2006). Scopus reviewed and compared: The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar. Netherlands: Utrecht University Library.Google Scholar
- Clarivate. (2017). Web of science core collection. Retrieved from http://wokinfo.com/products_tools/multidisciplinary/webofscience/.
- Currano, J. L., & Roth, D. L. (Eds.). (2014). Chemical information for chemists. A primer. Cambridge: RSC Publishing.Google Scholar
- Effendy, S., & Yap, R. H. (2016). Investigations on rating computer sciences conferences: An experiment with the Microsoft Academic Graph dataset. In J. Bourdeau, J. A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 425–430). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/2872518.2890525.
- Effendy, S., & Yap, R. H. (2017). Analysing trends in computer science research: A preliminary study using the Microsoft Academic Graph. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th international conference companion on world wide web (pp. 1245–1250). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/3041021.3053064.
- Elsevier. (2017). Scopus content coverage guide January 2016. Retrieved from https://www.elsevier.com/__data/assets/pdf_file/0007/69451/scopus_content_coverage_guide.pdf.
- Fagan, J. C. (2017). An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Information Technology and Libraries, 36(2), 7–47. doi: 10.6017/ital.v36i2.9718.
- Gumpenberger, C., Sorz, J., Wieland, M., & Gorraiz, J. (2016). Humanities and social sciences in the bibliometric spotlight—Research output analysis at the University of Vienna and considerations for increasing visibility. Research Evaluation, 25(3), 271–278. doi: 10.1093/reseval/rvw013.CrossRefGoogle Scholar
- Herrmannova, D., & Knoth, P. (2016b). Semantometrics: Towards fulltext-based research evaluation. In N.R. Adam, B. Cassel, & Y. Yesha (Eds.), Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries (pp. 235–236). New York: ACM. doi: 10.1145/2910896.2925448.
- Hug, S. E., & Brändle, M. P. (2017). Microsoft Academic is on the verge of becoming a bibliometric superpower. LSE Impact Blog. Retrieved from http://blogs.lse.ac.uk/impactofsocialsciences/2017/06/19/microsoft-academic-is-on-the-verge-of-becoming-a-bibliometric-superpower/.
- Luo, D., Gong, C., Hu, R., Duan, L., & Ma, S. (2016). Ensemble enabled weighted PageRank. Retrieved from https://arxiv.org/abs/1604.05462v1.
- Main Library of the University of Zurich. (2017). Regulations. Retrieved from http://www.oai.uzh.ch/en/working-with-zora/regulations.
- Microsoft. (2017a). Microsoft Academic Graph. Retrieved from https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
- Microsoft. (2017b). Microsoft Academic. Retrieved from https://www.microsoft.com/en-us/research/project/academic/.
- Microsoft. (2017c). Microsoft Academic. Frequently Asked Questions. Retrieved from https://academic.microsoft.com/#/faq.
- Microsoft Academic [@MSFTAcademic]. (2017). Some facts about the current size of our data. Stop & meet us at #kdd2017 @MLatMSFT [Tweet]. Retrieved from https://twitter.com/MSFTAcademic/status/897494672200921088.
- Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer.Google Scholar
- OECD. (2007). Revised field of science and technology (FOS) classification in the Frascati manual. Paris, France: Working Party of National Experts on Science and Technology Indicators, Organisation for Economic Co-operation and Development (OECD).Google Scholar
- Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Cambridge: Chandos Publishing.Google Scholar
- Portenoy, J., Hullman, J., & West, J. D. (2016). Leveraging citation networks to visualize scholarly influence over time. Retrieved from https://arxiv.org/abs/1611.07135v2.
- Portenoy, J., & West, J. D. (2017). Visualizing scholarly publications and citations to enhance author profiles. In R. Barret, & R. Cummings (Eds.), Proceedings of the 26th International Conference Companion on World Wide Web (pp. 1279-1282). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/3041021.3053058.
- Ribas, S., Ueda, A., Santos, R. L., Ribeiro-Neto, B., & Ziviani, N. (2016). Simplified relative citation ratio for static paper ranking. Retrieved from https://arxiv.org/abs/1603.01336v1.
- Sandulescu, V., & Chiru, M. (2016). Predicting the future relevance of research institutions—The winning solution of the KDD Cup 2016. Retrieved from https://arxiv.org/abs/1609.02728v1.
- Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-j. P., & Wang, K. (2015). An overview of Microsoft Academic Service (MAS) and applications. In A. Gangemi, S. Leonardi, & A. Panconesi (Eds.), Proceedings of the 24th international conference on world wide web (pp. 243–246). New York: ACM. doi: 10.1145/2740908.2742839.
- Tan, Z., Liu, C., Mao, Y., Guo, Y., Shen, J., & Wang, X. (2016). AceMap: A novel approach towards displaying relationship among academic literatures. In J. Bourdeau, J.A. Hendler, & R. Nkambou Nkambou (Eds.), Proceedings of the 25th international conference companion on world wide web (pp. 437–442). Geneva: International World Wide Web Conferences Steering Committee. doi: 10.1145/2872518.2890514.
- Wade, A., Kuasan, W., Yizhou, S., & Gulli, A. (2016). WSDM cup 2016: Entity ranking challenge. In P. N. Bennet, V. Josifovski, J. Neville, & F. Radlinski (Eds.), Proceedings of the ninth ACM international conference on web search and data mining (pp. 593–594). New York: ACM.Google Scholar
- Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). Retrieved from https://arxiv.org/abs/1606.08534v1.
- Wilson, J., Mohan, R., Arif, M., Chaudhury, S., & Lall, B. (2016). Ranking academic institutions on potential paper acceptance in upcoming conferences. Retrieved from https://arxiv.org/abs/1610.02828v1.
- Xiao, S., Yan, J., Li, C., Jin, B., Wang, X., Zha, H., & Yang, X. (2016). On modeling and predicting individual paper citation count over time. In S. Kambhampati (Ed.), Proceedings of the 25th international joint conference on artificial intelligence (pp. 2676–2682). New York: AAAI Press.Google Scholar