Skip to main content

Name2Vec: Personal Names Embeddings

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Abstract

Predicting if two names refer to the same entity is an important task for many domains, such as information retrieval, record linkage and data integration. In this paper, we propose to create name-embeddings by employing a Doc2Vec methodology, where each name is viewed as a document and each letter in the name is considered a word. Our hypothesis is that representing names as documents, with letters as words, will help capture the internal structure of names and relationships among letters. We present and discuss an experimental study where we explore the effect of various parameters, and we assess the stability of the models built for the embedding of names. Our results show that the new proposed method can predict with high accuracy when a pair of names matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is due to the fact that some of the vectors could have negative values as well.

References

  1. Name2Vec implementation and results (2019). https://github.com/foxcroftjn/CanAI-Name2Vec

  2. Antonie, L., Inwood, K., Lizotte, D.J., Ross, J.A.: Tracking people over time in 19th century Canada for longitudinal analysis. Mach. Learn. 95(1), 129–146 (2014)

    Article  MathSciNet  Google Scholar 

  3. Carvalho, V.R., Kiran, Y., Borthwick, A.: The Intelius nickname collection: quantitative analyses from billions of public records. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 607–610 (2012)

    Google Scholar 

  4. Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Proceedings of IEEE International Conference on Data Mining - Workshops, pp. 290–294 (2006)

    Google Scholar 

  5. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)

    Google Scholar 

  6. Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)

    Article  Google Scholar 

  7. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  10. MĂ¼ller, M.-C.: Semantic author name disambiguation with word embeddings. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 300–311. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_24

    Chapter  Google Scholar 

  11. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, May 2010

    Google Scholar 

  12. Sim, A., Borthwick, A.: Record2Vec: unsupervised representation learning for structured records. In: IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018, pp. 1236–1241 (2018)

    Google Scholar 

  13. Sukharev, J., Zhukov, L., Popescul, A.: Parallel corpus approach for name matching in record linkage. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 995–1000 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luiza Antonie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Foxcroft, J., d’Alessandro, A., Antonie, L. (2019). Name2Vec: Personal Names Embeddings. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18305-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18304-2

  • Online ISBN: 978-3-030-18305-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics