Skip to main content

Use of ResearchGate and Google CSE for author name disambiguation

Abstract

Author name disambiguation plays a very important role in individual based bibliometric analysis and has suffered from lack of information. Therefore, some have tried to leverage external web sources to obtain additional evidence with success. However, the main problem is generally the high cost of extracting data from web pages due to their diverse designs. Considering this challenge, we employed ResearchGate (RG), a social network platform for scholars presenting their publication lists in a structured way. Even though the platform might be imperfect, it can be valuable when it is used along with traditional approaches for the purpose of confirmation. To this end, in our first (retrieval) stage we applied a graph based machine learning approach, connected components (CC) and formed clusters. Then, the data crawled from RG for the same authors were combined with the CC results in stage 2. We observed that 76.40% of the clusters formed by CC were confirmed by the RG data and they accounted for 68.33% of all citations. Second, a subset was drawn from the dataset by retaining those clusters having at least 10 members to examine the details. This time we additionally employed the Google Custom Search Engine (CSE) API to access authors’ web pages as a complementary tool to RG. We observed an F score of 0.95 when CC results were confirmed by RG&CSE. Almost the same success was observed when only the CC approach was applied. In addition, we observed that the publications identified and confirmed through the external sources were cited to a greater extent than those publications not found in the related external sources. Even though promising, there are still issues with the use of external sources. We have seen that many authors present only a few selected papers on the web. This hampers our procedure, making it unable to obtain the entire publication list. Missing publications affect bibliometric analysis adversely since all citation data is required. That is, if only the data confirmed via external sources is used, bibliometric indicators will be overestimated. On the other hand, our suggested methodology can potentially decrease the manual work required for individual based bibliometric analysis. The procedure may also present more reliable results by confirming cluster members derived from unsupervised grouping methods. This approach might be especially beneficial for large datasets where extensive manual work would otherwise be required.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  • Abdulhayoglu, M. A., Thijs, B., & Jeuris, W. (2016). Using character n-grams to match a list of publications to references in bibliographic databases. Scientometrics, 109(3), 1525–1546.

    Article  Google Scholar 

  • Caron, E., & van Eck, N. J. (2014). Large scale author name disambiguation using rule-based scoring and clustering. In 19th International Conference on Science and Technology Indicators. Context counts: Pathways to master big data and little data (pp. 79–86). CWTS-Leiden University, Leiden.

  • Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.

    Article  Google Scholar 

  • D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.

    Article  Google Scholar 

  • Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. Acm Sigmod Record, 41(2), 15–26.

    Article  Google Scholar 

  • Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on (pp. 296–305). IEEE.

  • Han, H., Xu, W., Zha, H., & Giles, C. L. (2005). A hierarchical naive Bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM symposium on Applied computing (pp. 1065–1069). ACM.

  • Kanani, P. H., McCallum, A., & Pal, C. (2007). Improving Author Coreference by Resource-Bounded Information Gathering from the Web. In Proceedings of the IJCAI (pp. 429–434).

  • Kang, I. S., Na, S. H., Lee, S., Jung, H., Kim, P., Sung, W. K., et al. (2009). On co-authorship for author disambiguation. Information Processing and Management, 45(1), 84–97.

    Article  Google Scholar 

  • Kondrak, G. (2005). N-gram similarity and distance. In Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), (pp. 115–126), Buenos Aires, Argentina.

  • Ortega, J. L. (2015). Relationship between altmetric and bibliometric indicators across academic social sites: The case of CSIC’s members. Journal of Informetrics, 9(1), 39–49.

    Article  Google Scholar 

  • Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H., Gonçalves, M. A., & Ferreira, A. A. (2009). Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries (pp. 49–58). ACM.

  • Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual review of information science and technology, 43(1), 1–43.

    Article  Google Scholar 

  • Song, Y., Huang, J., Councill, I. G., Li, J., & Giles, C. L. (2007). Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 342–351). ACM.

  • Song, M., Kim, E. H. J., & Kim, H. J. (2015). Exploring author name disambiguation on PubMed-scale. Journal of Informetrics, 9(4), 924–941.

    Article  Google Scholar 

  • Tang, J., Yao, L., Zhang, D., & Zhang, J. (2010). A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(1), 2.

    Article  Google Scholar 

  • Thelwall, M., & Kousha, K. (2015). ResearchGate: Disseminating, communicating, and measuring Scholarship? Journal of the Association for Information Science and Technology, 66(5), 876–889.

    Article  Google Scholar 

  • Veloso, A., Ferreira, A. A., Gonçalves, M. A., Laender, A. H., & Meira, W. (2012). Cost-effective on-demand associative author name disambiguation. Information Processing and Management, 48(4), 680–697.

    Article  Google Scholar 

  • Yang, K. H., Peng, H. T., Jiang, J. Y., Lee, H. M., & Ho, J. M. (2008). Author name disambiguation for citations using topic and web correlation. In B. Christensen-Dalsgaard, D. Castelli, B. A. Jurik & J. Lippincott (Eds.), Research and advanced technology for digital libraries (pp. 185–196). Springer, Heidelberg.

  • Zhang, D., Tang, J., Li, J., & Wang, K. (2007). A constraint-based probabilistic framework for name disambiguation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 1019–1022). ACM.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Ali Abdulhayoglu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abdulhayoglu, M.A., Thijs, B. Use of ResearchGate and Google CSE for author name disambiguation. Scientometrics 111, 1965–1985 (2017). https://doi.org/10.1007/s11192-017-2341-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2341-y

Keywords

  • Author name disambiguation
  • Researchgate
  • Google CSE
  • Information retrieval