Scientometrics

, Volume 104, Issue 3, pp 931–949 | Cite as

Methods for estimating the size of Google Scholar

  • Enrique Orduna-Malea
  • Juan M. Ayllón
  • Alberto Martín-Martín
  • Emilio Delgado López-Cózar
Article

Abstract

The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160–165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.

Keywords

Academic search engines Google Scholar Estimation methods Size Coverage Webometrics 

Supplementary material

11192_2015_1614_MOESM1_ESM.pdf (451 kb)
Supplementary material 1 (PDF 452 kb)

References

  1. Adamic, I. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.CrossRefGoogle Scholar
  2. Adar, E., Teevan, J., & Dumais, S. T. (2009). Resonance on the web: Web dynamics and revisitation patterns. In Proceedings of the SIGCHI conference on human factors in computing systems (pp.1381–1390).Google Scholar
  3. Aguillo, I. F. (2012). Is Google Scholar useful for bibliometrics? A webometric analysis. Scientometrics, 91(2), 343–351.CrossRefGoogle Scholar
  4. Albert, R., Jeong, H., & Barabasi, A. L. (1999). Internet—Diameter of the world-wide web. Nature, 401(6749), 130–131.CrossRefGoogle Scholar
  5. Barabasi, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 282(5439), 509–512.MathSciNetGoogle Scholar
  6. Berman, Elizabeth P. (2012). Creating the market university: How academic science became an economic engine. New Jersey: Princeton University Press.CrossRefGoogle Scholar
  7. Brewington, B. E., & Cybenko, G. (2000). How dynamic is the Web? Computer Networks, 33(1–6), 257–276.CrossRefGoogle Scholar
  8. Cothran, T. (2011). Google Scholar acceptance and use among graduate students: A quantitative study. Library and Information Science Research, 33(4), 293–301.CrossRefGoogle Scholar
  9. Delgado López-Cózar, E., & Cabezas-Clavijo, Á. (2013). Ranking journals: Could Google scholar metrics be an alternative to journal citation reports and Scimago journal rank? Learned Publishing, 26(2), 101–113.CrossRefGoogle Scholar
  10. De Winter, J. C. F., Zadpoor, A., & Dodou, D. (2014). The expansion of Google Scholar versus Web of Science: A longitudinal study. Scientometrics, 98(2), 1547–1565.CrossRefGoogle Scholar
  11. Dobra, A., & Fienberg, S. E. (2004). How large is the world wide web?. In M. Levene & A. Poulovassilis (Eds.), Web dynamics (pp. 23–43). Springer: Berlin.Google Scholar
  12. Harzing, A.-W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575.CrossRefGoogle Scholar
  13. Jacsó, P. (2006). Dubious hit counts and cuckoo’s eggs. Online Information Review, 30(2), 188–193.CrossRefGoogle Scholar
  14. Jacsó, P. (2008). Google scholar revisited. Online Information Review, 32(1), 102–114.CrossRefGoogle Scholar
  15. Jacsó, P. (2011). The pros and cons of Microsoft Academic Search from a bibliometric perspective. Online Information Review, 35(6), 983–997.CrossRefGoogle Scholar
  16. Joint Information Systems Committee (2012). Researchers of tomorrow: The research behaviour of generation Y doctoral students. http://www.jisc.ac.uk/media/documents/publications/reports/2012/Researchers-of-Tomorrow.pdf. Accessed 10 October 2014.
  17. Khabsa, M., & Giles, C. L. (2014). The number of scholarly documents on the public web. PLoS ONE, 9(5), e93949.CrossRefGoogle Scholar
  18. Koehler, W. (1999). An analysis of web page and web site constancy and permanence. Journal of the American Society for Information Science, 50(2), 162–180.MathSciNetCrossRefGoogle Scholar
  19. Koehler, W. (2002). Web page change and persistence—A four-year longitudinal study. Journal of the American Society for Information Science and Technology, 53(2), 162–171.CrossRefGoogle Scholar
  20. Koehler, W. (2004). A longitudinal study of Web pages continued a consideration of document persistence. Information Research, 9(2). http://informationr.net/ir/9-2/paper174.html. Accessed 10 October 2014.
  21. Kousha, K., & Thelwall, M. (2008). Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines. Scientometrics, 74(2), 273–294.CrossRefGoogle Scholar
  22. Lawrence, S., & Giles, C. (1998). Searching the world wide web. Science, 280(5360), 98–100.CrossRefGoogle Scholar
  23. Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(6740), 107.CrossRefGoogle Scholar
  24. Levene, M., Fenner, T., Loizou, G., & Wheeldon, R. (2002). A stochastic model for the evolution of the web. Computer Networks, 39(3), 277–287.CrossRefGoogle Scholar
  25. Martín-Martín, A., Orduna-Malea, E., Ayllón, J. M., & Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited documents (1950–2013)? Granada: EC3 Working Papers, 19. http://arxiv.org/abs/1410.8464. Accessed 20 March 2015.
  26. Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of Science versus Scopus and Google Scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.CrossRefGoogle Scholar
  27. Miri, S. M., Raoofi, A., & Heidari, Z. (2012). Citation analysis of hepatitis monthly by journal citation report (ISI), Google Scholar, and Scopus. Hepatitis Monthly, 12(9), e7441.CrossRefGoogle Scholar
  28. Orduna-Malea, E., & Delgado López-Cózar, E. (2014). Google Scholar Metrics evolution: An analysis according to languages. Scientometrics, 98(3), 2353–2367.CrossRefGoogle Scholar
  29. Orduna-Malea, E., Martin-Martin, A., Ayllón, Juan M., & Delgado López-Cózar, E. (2014). The silent fading of an academic search engine: The case of Microsoft Academic Search. Online Information Review, 38(7), 936–953.CrossRefGoogle Scholar
  30. Orduna-Malea, E., Serrano-Cobos, J., & Lloret-Romero, N. (2009). Las universidades públicas españolas en Google Scholar: Presencia y evolución de su publicación académica web. El profesional de la información, 18(5), 493–500.CrossRefGoogle Scholar
  31. Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Netherlands: Elsevier. Chandos Information Professional Series.Google Scholar
  32. Ortega, J. L., Aguillo, I., & Prieto, J. A. (2006). Longitudinal study of content and elements in the scientific web environment. Journal of Information Science, 32(4), 344–351.CrossRefGoogle Scholar
  33. Payne, N., & Thelwall, M. (2007). A longitudinal study of academic webs: Growth and stabilisation. Scientometrics, 71(3), 523–539.CrossRefGoogle Scholar
  34. Payne, N., & Thelwall, M. (2008a). Do academic link types change over time? Journal of Documentation, 64(5), 707–720.CrossRefGoogle Scholar
  35. Payne, N., & Thelwall, M. (2008b). Longitudinal trends in academic web links. Journal of Information Science, 34(1), 3–14.CrossRefGoogle Scholar
  36. Uyar, A. (2009). Investigation of the accuracy of search engine hit counts. Journal of Information Science, 35(4), 469–480.MathSciNetCrossRefGoogle Scholar
  37. Van Noorden, R. (2014). Scientists and the social network. Nature, 512(7513), 126–129.CrossRefGoogle Scholar
  38. Wilkinson, D., & Thelwall, M. (2013). Search markets and search results: The case of Bing. Library and Information Science Research, 35(4), 318–325.CrossRefGoogle Scholar
  39. Yang, K., & Meho, L. I. (2006). Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science. Proceedings of the American Society for Information Science and Technology, 43(1), 1–15.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2015

Authors and Affiliations

  • Enrique Orduna-Malea
    • 1
  • Juan M. Ayllón
    • 2
  • Alberto Martín-Martín
    • 2
  • Emilio Delgado López-Cózar
    • 2
  1. 1.EC3 Research GroupPolytechnic University of ValenciaValenciaSpain
  2. 2.EC3 Research GroupUniversidad de GranadaGranadaSpain

Personalised recommendations