Abstract
In this chapter, we present the EVA (Extraction, Validation, and Analysis) project and related results about the use of Google Scholar as web database for calculation of citation indexes in non-bibliometric scientific areas, such as social sciences and humanities. The results of the EVA project are presented on a case-study about the publication records retrieved from Google Scholar for a dataset of Italian academic researchers belonging to non-bibliometric scientific areas. The evaluation results against the Elsevier-Scopus citation database are also discussed.
Notes
- 1.
The complete name of the area is Antiquities, Philological-Literacy and Historical Artistic Sciences. It has been shortened only for readability. The name of the Italian researcher is irrelevant for the clarity of the example and it is omitted.
- 2.
In this chapter, we call publication p the bibliographic record returned by Google Scholar in response to a query.
- 3.
A detailed description of normalization techniques and related NLP functions is provided in (Manning et al. 2008).
- 4.
The edit distance between two titles’ strings (S1 and S2) is the minimum number of operations (inclusion, substitution or deletion) on single characters needed to transform S1 into S2.
References
Aguillo, I. F. (2012). Is Google scholar useful for bibliometrics? A webometric analysis. Scientometrics, 91(2), 343–351.
Archambault, É., & Larivière, V. (2010). The limits of bibliometrics for the analysis of the social sciences and humanities literature. World Social Science Report, 251–254.
Archambault, É., Vignola-Gagne, E., Côtè, G., Larivire, V., & Gingrasb, Y. (2006). Benchmarking scientific output in the social sciences and humanities: The limits of existing databases. Scientometrics, 68(3), 329–342.
Biolcati-Rinaldi, F., Ferrara, A., Pinotti, L., & Salini, S. (2012). Lesson learned by Unimival researchers during the comparative bibliometric analysis project (ABC). in Rassegna Italiana di Valutazione, v. XVI, n. 52, pp. 81-100, ISSN 1826-0713.
Bornmann, L., Mutz, R., Neuhaus, C., & Daniel, H. D. (2008). Citation counts for research evaluation: Standards of good practice for analyzing bibliometric data and presenting and interpreting results. Ethics in science and environmental politics, 8(1), 93–102.
Burrell, L. Q. (2006). Measuring concentration within and co-concentration between informetric distributions: An empirical study. Scientometrics, 68(3), 441–456. https://doi.org/10.1007/s11192-006-0122-0.
Checchi, D., De Fraja, G., & Verzillo, S. (2014). Publish or Perish: An Analysis of the Academic Job Market in Italy. Tech. Rep. Discussion Paper 10084, CEPR Discussion Paper.
D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.
Daniel, H. D., & Fisch, R. (1990). Research performance evaluation in the german university sector. Scientometrics, 19(5), 349–361. https://doi.org/10.1007/BF02020698.
Dumais, S. T. (2004). Latent semantic analysis. Annual Review of Information Science and Technology, 38(1).
Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G. (2008). Comparison of PubMed, Scopus, web of science, and Google scholar: Strengths and weaknesses. The FASEB Journal, 22(2), 338–342.
Ferrara, A., & Salini, S. (2012). Ten challenges in modeling bibliographic data for Bibliometric analysis. Scientometrics, 93(3), 765–785.
Ferreira, A. A., Veloso, A., Gonçalves, M. A., & Laender, A. H. (2010). Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th annual ACM joint conference on digital libraries (pp. 39–48). Aarhus.
Garfield, E. (1980). Is information retrieval in the arts and humanities inherently different from that in science? The effect that ISI©‘S citation index for the arts and humanities is expected to have on future scholarship. The Library Quarterly, 40–57.
Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th joint ACM/IEEE conference on digital libraries (JCDL 2004) (pp. 296–305). Tucson.
Han, H., Xu, W., Zha, H., & Giles, L. (2005a). A hierarchical naive Bayes mixture model for name disambiguation in author citations. In Proceedings of the ACM symposium on applied computing (pp. 1065–1069). Santa Fe.
Han, H., Zha, H., & Giles, L. (2005b). Name disambiguation in author citations using a Kway spectral clustering method. In Proceedings of the 5th joint ACM/IEEE conference on digital libraries (JCDL 2005) (pp. 334–343). Denver.
Kousha, K., & Thelwall, M. (2007). Google scholar citations and Google web/URL citations: A multi-discipline exploratory analysis. Journal of the American Society for Information Science and Technology, 58(7), 1055–1065.
Linmans, A. J. (2010). Why with bibliometrics the humanities does not need to be the weakest link. Scientometrics, 83(2), 337–354.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.
Nederhof, A. J. (2006). Bibliometric monitoring of research performance in the social sciences and the humanities: A review. Scientometrics, 66(1), 81–100.
On, B. W., & Lee, D. (2007). Scalable name disambiguation using multi-level graph partition. In Proceedings of the SIAM International conference on data mining (pp. 575–580). Minneapolis, Minnesota.
Prins, A. A., Costas, R., van Leeuwen, T. N., & Wouters, P. F. (2016). Using google scholar in research evaluation of humanities and social science programs: A comparison with web of science data. Research Evaluation, 25(3), 264–270.
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.
Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. M. (2008). Author name disambiguation for citations using topic and web correlation. In Proceedings of the 12th European conference on digital libraries (pp. 185–196). Aarhus, Denmark.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Ferrara, A., Montanelli, S., Verzillo, S. (2018). Google Scholar as a Citation Database for Non-bibliometric Areas: The EVA Project Results. In: Bonaccorsi, A. (eds) The Evaluation of Research in Social Sciences and Humanities. Springer, Cham. https://doi.org/10.1007/978-3-319-68554-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-68554-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68553-3
Online ISBN: 978-3-319-68554-0
eBook Packages: Social SciencesSocial Sciences (R0)