Name usage pattern in the synonym ambiguity problem in bibliographic data

Gomide, Janaína; Kling, Hugo; Figueiredo, Daniel

doi:10.1007/s11192-017-2410-2

Name usage pattern in the synonym ambiguity problem in bibliographic data

Published: 16 May 2017

Volume 112, pages 747–766, (2017)
Cite this article

Scientometrics Aims and scope Submit manuscript

564 Accesses
8 Citations
Explore all metrics

Abstract

Individuals often appear with multiple names when considering large bibliographic datasets, giving rise to the synonym ambiguity problem. Although most related works focus on resolving name ambiguities, this work focus on classifying and characterizing multiple name usage patterns—the root cause for such ambiguity. By considering real examples bibliographic datasets, we identify and classify patterns of multiple name usage by individuals, which can be interpreted as name change, rare name usage, and name co-appearance. In particular, we propose a methodology to classify name usage patterns through a supervised classification task and show that different classes are robust (across datasets) and exhibit significantly different properties. We show that the collaboration network structure emerging around nodes corresponding to ambiguous names from different name usage patterns have strikingly different characteristics, such as their common neighborhood and degree evolution. We believe such differences in network structure and in name usage patterns can be leveraged to design more efficient name disambiguation algorithms that target the synonym problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting citation networks for large-scale author name disambiguation

Article Open access 25 September 2014

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Article 16 February 2018

Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation

Article 07 March 2020

Notes

http://dblp.uni-trier.de.
http://scholar.google.com.
Link to download the XML of the entire DBLP database: http://dblp.uni-trier.de/xml/.
See details on DBLP’s handling of synonyms at http://dblp.uni-trier.de/faq/How+does+dblp+handle+homonyms+and+synonyms.html.
CNPq is the Brazilian National Research Council responsible for funding research, similar to the National Science Foundation (NSF) in the United States of America.

References

Amancio, D. R., Oliveira, O. N., & Costa, L. D. F. (2015). Topological-collaborative approach for disambiguating authors’ names in collaborative networks. Scientometrics, 102(1), 465–485.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Elliot, S. (2010). Survey of author name disambiguation: 2004 to 2010. Library Philosophy and Practice. http://digitalcommons.unl.edu/libphilprac/473/.
Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1–16.
Article Google Scholar
Fegley, B. D., & Torvik, V. I. (2013). Has large-scale named-entity network analysis been resting on a flawed assumption? PLoS ONE, 8(7), 1–16.
Article Google Scholar
Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. SIGMOD Record, 41(2), 15–26.
Article Google Scholar
Gomide, J., Kling, H., & Figueiredo, D. (2015). A model for ambiguation and an algorithm for dis-ambiguation in social networks. In Complex networks VI, studies in computational intelligence (pp. 37–44). New York: Springer. doi:10.1007/978-3-319-16112-9_4.
Gupta, M., & Han, J. (2011). Heterogeneous network-based trust analysis: A survey. ACM SIGKDD Explorations Newsletter, 13(1), 54–71.
Article Google Scholar
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
Article MATH Google Scholar
Hermansson, L., Kerola, T., Johansson, F., et al. (2013). Entity disambiguation in anonymized graphs using graph kernels. In: Conference on information and knowledge management (CIKM).
Huang, J., Ertekin, S., & Giles, C. L. (2006). Fast author name disambiguation in citeseer. Information Sciences Institute Technical Reports.
Kim, J., & Diesner, J. (2015). The effect of data pre-processing on understanding the evolution of collaboration networks. Journal of Informetrics, 9(1), 226–236.
Article Google Scholar
Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.
Article Google Scholar
Li, G. C., Lai, R., DAmour, A., & Doolin, D. M. (2014). Disambiguation and co-authorship networks of the U.S. patent inventor database. Research Policy, 43(6), 941–955.
Article Google Scholar
Liu, W., Islamaj Doğan, R., Kim, S., et al. (2014). Author name disambiguation for pubmed. Journal of the Association for Information Science and Technology, 65(4), 765–781.
Article Google Scholar
Shen, W., Wang, J., & Han, J. (2015). Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2), 443–460.
Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.
Article Google Scholar
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Article Google Scholar
Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), 11.
Article Google Scholar
Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.
Article Google Scholar
Zhang, B., Saha, T. K., & Hasan, M. A. (2014). Name disambiguation from link data in a collaboration graph. In: Advances in Social Networks Analysis and Minig (ASONAM).

Download references

Acknowledgements

This research received financial support through grants from FAPERJ and CNPq (Brazil).

Author information

Authors and Affiliations

PESC/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Janaína Gomide, Hugo Kling & Daniel Figueiredo

Authors

Janaína Gomide
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Kling
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janaína Gomide.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gomide, J., Kling, H. & Figueiredo, D. Name usage pattern in the synonym ambiguity problem in bibliographic data. Scientometrics 112, 747–766 (2017). https://doi.org/10.1007/s11192-017-2410-2

Download citation

Received: 06 July 2016
Published: 16 May 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11192-017-2410-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Name usage pattern in the synonym ambiguity problem in bibliographic data

Abstract

Access this article

Similar content being viewed by others

Exploiting citation networks for large-scale author name disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Name usage pattern in the synonym ambiguity problem in bibliographic data

Abstract

Access this article

Similar content being viewed by others

Exploiting citation networks for large-scale author name disambiguation

Author Name Disambiguation by Exploiting Graph Structural Clustering and Hybrid Similarity

Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation