Name usage pattern in the synonym ambiguity problem in bibliographic data
Individuals often appear with multiple names when considering large bibliographic datasets, giving rise to the synonym ambiguity problem. Although most related works focus on resolving name ambiguities, this work focus on classifying and characterizing multiple name usage patterns—the root cause for such ambiguity. By considering real examples bibliographic datasets, we identify and classify patterns of multiple name usage by individuals, which can be interpreted as name change, rare name usage, and name co-appearance. In particular, we propose a methodology to classify name usage patterns through a supervised classification task and show that different classes are robust (across datasets) and exhibit significantly different properties. We show that the collaboration network structure emerging around nodes corresponding to ambiguous names from different name usage patterns have strikingly different characteristics, such as their common neighborhood and degree evolution. We believe such differences in network structure and in name usage patterns can be leveraged to design more efficient name disambiguation algorithms that target the synonym problem.
KeywordsName ambiguity Classification Collaboration network
This research received financial support through grants from FAPERJ and CNPq (Brazil).
- Elliot, S. (2010). Survey of author name disambiguation: 2004 to 2010. Library Philosophy and Practice. http://digitalcommons.unl.edu/libphilprac/473/.
- Gomide, J., Kling, H., & Figueiredo, D. (2015). A model for ambiguation and an algorithm for dis-ambiguation in social networks. In Complex networks VI, studies in computational intelligence (pp. 37–44). New York: Springer. doi: 10.1007/978-3-319-16112-9_4.
- Hermansson, L., Kerola, T., Johansson, F., et al. (2013). Entity disambiguation in anonymized graphs using graph kernels. In: Conference on information and knowledge management (CIKM).Google Scholar
- Huang, J., Ertekin, S., & Giles, C. L. (2006). Fast author name disambiguation in citeseer. Information Sciences Institute Technical Reports.Google Scholar
- Shen, W., Wang, J., & Han, J. (2015). Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2), 443–460.Google Scholar
- Zhang, B., Saha, T. K., & Hasan, M. A. (2014). Name disambiguation from link data in a collaboration graph. In: Advances in Social Networks Analysis and Minig (ASONAM).Google Scholar