Skip to main content
Log in

Methods for estimating the size of Google Scholar

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160–165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://thewebindex.org.

  2. The Custom range option appears after a query is submitted in the search box of Google Scholar. The user can also access to the advanced search option to set the year range. Moreover, we can execute this query directly on the browser via http as well. Once we obtain the first results via hit count estimates, we can generate new queries without introducing any keyword in the search box, and only selecting the time span required. This is the procedure followed in this study.

  3. Additional information about the biases of WoS towards English and article document type is available in the supplementary material (Appendix V).

References

  • Adamic, I. A., & Huberman, B. A. (2001). The web’s hidden order. Communications of the ACM, 44(9), 55–59.

    Article  Google Scholar 

  • Adar, E., Teevan, J., & Dumais, S. T. (2009). Resonance on the web: Web dynamics and revisitation patterns. In Proceedings of the SIGCHI conference on human factors in computing systems (pp.1381–1390).

  • Aguillo, I. F. (2012). Is Google Scholar useful for bibliometrics? A webometric analysis. Scientometrics, 91(2), 343–351.

    Article  Google Scholar 

  • Albert, R., Jeong, H., & Barabasi, A. L. (1999). Internet—Diameter of the world-wide web. Nature, 401(6749), 130–131.

    Article  Google Scholar 

  • Barabasi, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 282(5439), 509–512.

    MathSciNet  Google Scholar 

  • Berman, Elizabeth P. (2012). Creating the market university: How academic science became an economic engine. New Jersey: Princeton University Press.

    Book  Google Scholar 

  • Brewington, B. E., & Cybenko, G. (2000). How dynamic is the Web? Computer Networks, 33(1–6), 257–276.

    Article  Google Scholar 

  • Cothran, T. (2011). Google Scholar acceptance and use among graduate students: A quantitative study. Library and Information Science Research, 33(4), 293–301.

    Article  Google Scholar 

  • Delgado López-Cózar, E., & Cabezas-Clavijo, Á. (2013). Ranking journals: Could Google scholar metrics be an alternative to journal citation reports and Scimago journal rank? Learned Publishing, 26(2), 101–113.

    Article  Google Scholar 

  • De Winter, J. C. F., Zadpoor, A., & Dodou, D. (2014). The expansion of Google Scholar versus Web of Science: A longitudinal study. Scientometrics, 98(2), 1547–1565.

    Article  Google Scholar 

  • Dobra, A., & Fienberg, S. E. (2004). How large is the world wide web?. In M. Levene & A. Poulovassilis (Eds.), Web dynamics (pp. 23–43). Springer: Berlin.

  • Harzing, A.-W. (2014). A longitudinal study of Google Scholar coverage between 2012 and 2013. Scientometrics, 98(1), 565–575.

    Article  Google Scholar 

  • Jacsó, P. (2006). Dubious hit counts and cuckoo’s eggs. Online Information Review, 30(2), 188–193.

    Article  Google Scholar 

  • Jacsó, P. (2008). Google scholar revisited. Online Information Review, 32(1), 102–114.

    Article  Google Scholar 

  • Jacsó, P. (2011). The pros and cons of Microsoft Academic Search from a bibliometric perspective. Online Information Review, 35(6), 983–997.

    Article  Google Scholar 

  • Joint Information Systems Committee (2012). Researchers of tomorrow: The research behaviour of generation Y doctoral students. http://www.jisc.ac.uk/media/documents/publications/reports/2012/Researchers-of-Tomorrow.pdf. Accessed 10 October 2014.

  • Khabsa, M., & Giles, C. L. (2014). The number of scholarly documents on the public web. PLoS ONE, 9(5), e93949.

    Article  Google Scholar 

  • Koehler, W. (1999). An analysis of web page and web site constancy and permanence. Journal of the American Society for Information Science, 50(2), 162–180.

    Article  MathSciNet  Google Scholar 

  • Koehler, W. (2002). Web page change and persistence—A four-year longitudinal study. Journal of the American Society for Information Science and Technology, 53(2), 162–171.

    Article  Google Scholar 

  • Koehler, W. (2004). A longitudinal study of Web pages continued a consideration of document persistence. Information Research, 9(2). http://informationr.net/ir/9-2/paper174.html. Accessed 10 October 2014.

  • Kousha, K., & Thelwall, M. (2008). Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines. Scientometrics, 74(2), 273–294.

    Article  Google Scholar 

  • Lawrence, S., & Giles, C. (1998). Searching the world wide web. Science, 280(5360), 98–100.

    Article  Google Scholar 

  • Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(6740), 107.

    Article  Google Scholar 

  • Levene, M., Fenner, T., Loizou, G., & Wheeldon, R. (2002). A stochastic model for the evolution of the web. Computer Networks, 39(3), 277–287.

    Article  Google Scholar 

  • Martín-Martín, A., Orduna-Malea, E., Ayllón, J. M., & Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited documents (1950–2013)? Granada: EC3 Working Papers, 19. http://arxiv.org/abs/1410.8464. Accessed 20 March 2015.

  • Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of Science versus Scopus and Google Scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.

    Article  Google Scholar 

  • Miri, S. M., Raoofi, A., & Heidari, Z. (2012). Citation analysis of hepatitis monthly by journal citation report (ISI), Google Scholar, and Scopus. Hepatitis Monthly, 12(9), e7441.

    Article  Google Scholar 

  • Orduna-Malea, E., & Delgado López-Cózar, E. (2014). Google Scholar Metrics evolution: An analysis according to languages. Scientometrics, 98(3), 2353–2367.

    Article  Google Scholar 

  • Orduna-Malea, E., Martin-Martin, A., Ayllón, Juan M., & Delgado López-Cózar, E. (2014). The silent fading of an academic search engine: The case of Microsoft Academic Search. Online Information Review, 38(7), 936–953.

    Article  Google Scholar 

  • Orduna-Malea, E., Serrano-Cobos, J., & Lloret-Romero, N. (2009). Las universidades públicas españolas en Google Scholar: Presencia y evolución de su publicación académica web. El profesional de la información, 18(5), 493–500.

    Article  Google Scholar 

  • Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Netherlands: Elsevier. Chandos Information Professional Series.

    Google Scholar 

  • Ortega, J. L., Aguillo, I., & Prieto, J. A. (2006). Longitudinal study of content and elements in the scientific web environment. Journal of Information Science, 32(4), 344–351.

    Article  Google Scholar 

  • Payne, N., & Thelwall, M. (2007). A longitudinal study of academic webs: Growth and stabilisation. Scientometrics, 71(3), 523–539.

    Article  Google Scholar 

  • Payne, N., & Thelwall, M. (2008a). Do academic link types change over time? Journal of Documentation, 64(5), 707–720.

    Article  Google Scholar 

  • Payne, N., & Thelwall, M. (2008b). Longitudinal trends in academic web links. Journal of Information Science, 34(1), 3–14.

    Article  Google Scholar 

  • Uyar, A. (2009). Investigation of the accuracy of search engine hit counts. Journal of Information Science, 35(4), 469–480.

    Article  MathSciNet  Google Scholar 

  • Van Noorden, R. (2014). Scientists and the social network. Nature, 512(7513), 126–129.

    Article  Google Scholar 

  • Wilkinson, D., & Thelwall, M. (2013). Search markets and search results: The case of Bing. Library and Information Science Research, 35(4), 318–325.

    Article  Google Scholar 

  • Yang, K., & Meho, L. I. (2006). Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science. Proceedings of the American Society for Information Science and Technology, 43(1), 1–15.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrique Orduna-Malea.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 452 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Orduna-Malea, E., Ayllón, J.M., Martín-Martín, A. et al. Methods for estimating the size of Google Scholar. Scientometrics 104, 931–949 (2015). https://doi.org/10.1007/s11192-015-1614-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1614-6

Keywords

Navigation