Methods for estimating the size of Google Scholar
The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160–165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.
KeywordsAcademic search engines Google Scholar Estimation methods Size Coverage Webometrics
- Adar, E., Teevan, J., & Dumais, S. T. (2009). Resonance on the web: Web dynamics and revisitation patterns. In Proceedings of the SIGCHI conference on human factors in computing systems (pp.1381–1390).Google Scholar
- Dobra, A., & Fienberg, S. E. (2004). How large is the world wide web?. In M. Levene & A. Poulovassilis (Eds.), Web dynamics (pp. 23–43). Springer: Berlin.Google Scholar
- Joint Information Systems Committee (2012). Researchers of tomorrow: The research behaviour of generation Y doctoral students. http://www.jisc.ac.uk/media/documents/publications/reports/2012/Researchers-of-Tomorrow.pdf. Accessed 10 October 2014.
- Koehler, W. (2004). A longitudinal study of Web pages continued a consideration of document persistence. Information Research, 9(2). http://informationr.net/ir/9-2/paper174.html. Accessed 10 October 2014.
- Martín-Martín, A., Orduna-Malea, E., Ayllón, J. M., & Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited documents (1950–2013)? Granada: EC3 Working Papers, 19. http://arxiv.org/abs/1410.8464. Accessed 20 March 2015.
- Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Netherlands: Elsevier. Chandos Information Professional Series.Google Scholar