Skip to main content

When Entity Resolution Meets Deep Learning, Is Similarity Measure Necessary?

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Applied Cognitive Computing
  • 1433 Accesses

Abstract

In Entity Resolution (ER), more and more unstructured records impose challenge to the traditional similarity-based approaches, since existing similarity metrics are designed for structured records. Now that similarity is hard to measure for unstructured records, can we do pairwise matching without similarity measure? To answer this question, this research leverages deep learning’s artificial intelligence to learn the underlying record matched pattern, rather than measuring records similarity first and then making linking decision based on the similarity measure. In the representation part, token order information is taken into account in word embedding, and not considered in Bag-of-Words (Count and TF-IDF); in the model part, multilayer perceptron (MLP), convolutional neural network (CNN), and long short-term memory (LSTM) are examined. Our experiments on both synthetic data and real-world data demonstrate that, surprisingly, the simplest representation (Count) and the simplest model (MLP) together get the best results both in effectiveness and efficiency. An F-measure as high as 1.00 in the pairwise matching task shows potential for further applying deep learning in other ER tasks like blocking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J.R. Talburt, Entity Resolution and Information Quality (Morgan Kaufmann, New York, USA, 2011)

    Google Scholar 

  2. M.A. Hernández, S.J. Stolfo, Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Disc. 2(1), 9–37 (1998)

    Article  Google Scholar 

  3. T.W. Victor, R.M. Mera, Record linkage of health care insurance claims. J. Am. Med. Inform. Assoc. 8(3), 281–288 (2001)

    Article  Google Scholar 

  4. H. Köpcke, A. Thor, S. Thomas, E. Rahm, Tailoring entity resolution for matching product offers. In Proceedings of the 15th International Conference on Extending Database Technology, 2012 Mar 27, pp. 545–550

    Google Scholar 

  5. S.E. Whang, H. Garcia-Molina, Entity resolution with evolving rules. Proc. VLDB Endowment 3(1–2), 1326–1337 (2010)

    Article  Google Scholar 

  6. L. Li, J. Li, H. Gao, Rule-based method for entity resolution. IEEE Trans. Knowl. Data Eng. 27(1), 250–263 (2014)

    Article  Google Scholar 

  7. I.P. Fellegi, A.B. Sunter, A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)

    Article  MATH  Google Scholar 

  8. P. Wang, D. Pullen, J.R. Talburt, C. Chen, A method for match key blocking in probabilistic matching. In Information Technology: New Generations, pp. 847–857, 2016

    Google Scholar 

  9. L. Kolb, H. Köpcke, A. Thor, E. Rahm, Learning-based entity resolution with MapReduce, in Proceedings of the Third International Workshop on Cloud Data Management, (2011 Oct 28), pp. 1–6

    Google Scholar 

  10. Z. Chen, Z. Li, Gradual Machine Learning for Entity Resolution. arXiv preprint arXiv:1810.12125 (2018)

    Google Scholar 

  11. I. Bhattacharya, L. Getoor, A latent Dirichlet model for unsupervised entity resolution. In Proceedings of the 2006 SIAM International Conference on Data Mining, 2006 Apr 20, pp. 47–58

    Google Scholar 

  12. S. Song, L. Chen, Probabilistic correlation-based similarity measure of unstructured records. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007 Nov 6, pp. 967–970

    Google Scholar 

  13. J. Wang, G. Li, J.X. Yu, J. Feng, Entity matching: how similar is similar. Proc. VLDB Endowment 4(10), 622–633 (2011)

    Article  Google Scholar 

  14. Y. Lin, H. Wang, J. Li, H. Gao, Efficient entity resolution on heterogeneous records. IEEE Trans. Knowl. Data Eng. 32(5), 912–926 (2019)

    Google Scholar 

  15. M. Ebraheem, S. Thirumuruganathan, S. Joty, M. Ouzzani, N. Tang, Distributed representations of tuples for entity resolution. Proc. VLDB Endowment 11(11), 1454–1467 (2018)

    Article  Google Scholar 

  16. S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra, Deep learning for entity matching: a design space exploration, in Proceedings of the 2018 International Conference on Management of Data, (2018 May 27), pp. 19–34

    Chapter  Google Scholar 

  17. R.D. Gottapu, C. Dagli, B. Ali, Entity resolution using convolutional neural network. Proc. Comput. Sci. 95, 153–158 (2016)

    Article  Google Scholar 

  18. S. Thirumuruganathan, S.A. Parambath, M. Ouzzani, N. Tang, S. Joty, Reuse and adaptation for entity resolution through transfer learning. arXiv preprint arXiv:1809.11084 (2018 Sep 28)

    Google Scholar 

  19. A.K. Elmagarmid, P.G. Ipeirotis, V.S. Verykios, Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–6 (2006)

    Article  Google Scholar 

  20. Y. Kim, Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014 Aug 25)

    Google Scholar 

  21. Y. Zhang, B. Wallace, A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015 Oct 13)

    Google Scholar 

  22. Y. Wang, M. Huang, X. Zhu, L. Zhao, Attention-based LSTM for aspect-level sentiment classification, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (2016 Nov), pp. 606–615

    Chapter  Google Scholar 

  23. A. McCallum. Cora dataset, https://doi.org/10.18738/T8/HUIG48 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinming Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Talburt, J.R., Li, T., Liu, X. (2021). When Entity Resolution Meets Deep Learning, Is Similarity Measure Necessary?. In: Arabnia, H.R., Ferens, K., de la Fuente, D., Kozerenko, E.B., Olivas Varela, J.A., Tinetti, F.G. (eds) Advances in Artificial Intelligence and Applied Cognitive Computing. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-70296-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-70296-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-70295-3

  • Online ISBN: 978-3-030-70296-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics