Name2Vec: Personal Names Embeddings

Foxcroft, Jeremy; d’Alessandro, Adrian; Antonie, Luiza

doi:10.1007/978-3-030-18305-9_52

Jeremy Foxcroft¹⁶,
Adrian d’Alessandro¹⁶ &
Luiza Antonie¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

3049 Accesses
5 Citations

Abstract

Predicting if two names refer to the same entity is an important task for many domains, such as information retrieval, record linkage and data integration. In this paper, we propose to create name-embeddings by employing a Doc2Vec methodology, where each name is viewed as a document and each letter in the name is considered a word. Our hypothesis is that representing names as documents, with letters as words, will help capture the internal structure of names and relationships among letters. We present and discuss an experimental study where we explore the effect of various parameters, and we assess the stability of the models built for the embedding of names. Our results show that the new proposed method can predict with high accuracy when a pair of names matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This is due to the fact that some of the vectors could have negative values as well.

References

Name2Vec implementation and results (2019). https://github.com/foxcroftjn/CanAI-Name2Vec
Antonie, L., Inwood, K., Lizotte, D.J., Ross, J.A.: Tracking people over time in 19th century Canada for longitudinal analysis. Mach. Learn. 95(1), 129–146 (2014)
Article MathSciNet Google Scholar
Carvalho, V.R., Kiran, Y., Borthwick, A.: The Intelius nickname collection: quantitative analyses from billions of public records. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 607–610 (2012)
Google Scholar
Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Proceedings of IEEE International Conference on Data Mining - Workshops, pp. 290–294 (2006)
Google Scholar
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
Google Scholar
Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)
Article Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Müller, M.-C.: Semantic author name disambiguation with word embeddings. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 300–311. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_24
Chapter Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, May 2010
Google Scholar
Sim, A., Borthwick, A.: Record2Vec: unsupervised representation learning for structured records. In: IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018, pp. 1236–1241 (2018)
Google Scholar
Sukharev, J., Zhukov, L., Popescul, A.: Parallel corpus approach for name matching in record linkage. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 995–1000 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Guelph, Guelph, Canada
Jeremy Foxcroft, Adrian d’Alessandro & Luiza Antonie

Authors

Jeremy Foxcroft
View author publications
You can also search for this author in PubMed Google Scholar
Adrian d’Alessandro
View author publications
You can also search for this author in PubMed Google Scholar
Luiza Antonie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luiza Antonie .

Editor information

Editors and Affiliations

University of Quebec in Montreal, Montreal, QC, Canada
Marie-Jean Meurs
University of Toronto, Toronto, ON, Canada
Frank Rudzicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Foxcroft, J., d’Alessandro, A., Antonie, L. (2019). Name2Vec: Personal Names Embeddings. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-030-18305-9_52
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics