Skip to main content

Empowering Transformer with Hybrid Matching Knowledge for Entity Matching

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13247))

Included in the following conference series:

Abstract

Transformers have achieved great success in many NLP tasks. The self-attention mechanism of Transformer learns powerful representation by conducting token-level pairwise interactions within the input sequence. In this paper, we propose a novel entity matching framework named GTA. GTA enhances Transformer for relational data representation by injecting additional hybrid matching knowledge. The hybrid matching knowledge is obtained via graph contrastive learning on a designed hybrid matching graph, in which the dual-level matching and multiple granularity interactions are modeled. In this way, GTA utilizes the prelearned knowledge of both hybrid matching and language modeling. This effectively empowers Transformer to understand the structural features of relational data when performing entity matching. Extensive experiments on open datasets show that GTA effectively enhances Transformer for relational data representation and outperforms state-of-the-art entity matching frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    RoBERTa has proved that removing next sentence prediction (NSP) training objective can improve downstream task performance.

  2. 2.

    https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md.

  3. 3.

    https://www.dgl.ai/.

  4. 4.

    https://huggingface.co/.

References

  1. Abedjan, Z., et al.: Detecting data errors: where are we and what needs to be done? Proc. VLDB Endow. 9(12), 993–1004 (2016)

    Article  Google Scholar 

  2. Brunner, U., Stockinger, K.: Entity matching with transformer architectures-a step forward in data integration. In: International Conference on Extending Database Technology, Copenhagen, 30 March–2 April 2020. OpenProceedings (2020)

    Google Scholar 

  3. Cappuzzo, R., Papotti, P., Thirumuruganathan, S.: Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1335–1349 (2020)

    Google Scholar 

  4. Chen, R., Shen, Y., Zhang, D.: GNEM: a generic one-to-set neural entity matching framework. In: Proceedings of the Web Conference 2021, pp. 1686–1694 (2021)

    Google Scholar 

  5. Christen, P.: Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection (2012)

    Google Scholar 

  6. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)

  7. Dalvi, N., Rastogi, V., Dasgupta, A., Das Sarma, A., Sarlós, T.: Optimal hashing schemes for entity matching. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 295–306 (2013)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  9. Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE), pp. 1245–1248. IEEE (2013)

    Google Scholar 

  10. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  11. Dunn, H.L.: Record linkage. Am. J. Public Health Natl. Health 36(12), 1412–1416 (1946)

    Article  Google Scholar 

  12. Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018)

    Article  Google Scholar 

  13. Elmagarmid, A., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J.A., Tang, N., Yin, S.: NADEEF/ER: generic and interactive entity resolution. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1071–1074 (2014)

    Google Scholar 

  14. Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., Yin, D.: Graph neural networks for social recommendation. In: The World Wide Web Conference, pp. 417–426 (2019)

    Google Scholar 

  15. Fu, C., Han, X., He, J., 0001, L.S.: Hierarchical matching network for heterogeneous entity resolution. In: IJCAI, pp. 3665–3671 (2020)

    Google Scholar 

  16. Gokhale, C., et al.: Corleone: hands-off crowdsourcing for entity matching. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 601–612 (2014)

    Google Scholar 

  17. Jin, W., et al.: Graph representation learning: foundations, methods, applications and systems. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 4044–4045 (2021)

    Google Scholar 

  18. Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. Proc. VLDB Endow. 5(1) (2011)

    Google Scholar 

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)

    Google Scholar 

  20. Konda, P., et al.: Magellan: toward building entity matching management systems. Proc. VLDB Endow. 9(12), 1581–1584 (2016)

    Article  Google Scholar 

  21. Li, B., Miao, Y., Wang, Y., Sun, Y., Wang, W.: Improving the efficiency and effectiveness for BERT-based entity resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13226–13233 (2021)

    Google Scholar 

  22. Li, B., Wang, W., Sun, Y., Zhang, L., Ali, M.A., Wang, Y.: GraphER: token-centric entity resolution with graph convolutional neural networks. In: AAAI, pp. 8172–8179 (2020)

    Google Scholar 

  23. Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Deep entity matching with pre-trained language models. Proc. VLDB Endow. 14(1), 50–60 (2020)

    Article  Google Scholar 

  24. Li, Y., Li, J., Suhara, Y., Wang, J., Hirota, W., Tan, W.C.: Deep entity matching: challenges and opportunities. J. Data Inf. Qual. (JDIQ) 13(1), 1–17 (2021)

    Article  Google Scholar 

  25. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  26. Liu, Y., Pan, S., Jin, M., Zhou, C., Xia, F., Yu, P.S.: Graph self-supervised learning: a survey. arXiv preprint arXiv:2103.00111 (2021)

  27. Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, pp. 19–34 (2018)

    Google Scholar 

  28. Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  29. Peng, Y., Choi, B., Xu, J.: Graph learning for combinatorial optimization: a survey of state-of-the-art. Data Sci. Eng. 6(2), 119–141 (2021)

    Article  Google Scholar 

  30. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  31. Singh, R., et al.: Synthesizing entity matching rules by examples. Proc. VLDB Endow. 11(2), 189–202 (2017)

    Article  Google Scholar 

  32. Sun, C.C., Shen, D.R.: Mixed hierarchical networks for deep entity matching. J. Comput. Sci. Technol. 36(4), 822–838 (2021)

    Article  Google Scholar 

  33. Sun, C., Shen, D.: Entity resolution with hybrid attention-based networks. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12682, pp. 558–565. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73197-7_37

    Chapter  Google Scholar 

  34. Tang, N., et al.: RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation. Proc. VLDB Endow. 14(8), 1254–1261 (2021)

    Article  Google Scholar 

  35. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2495–2504 (2021)

    Google Scholar 

  36. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11) (2012)

    Google Scholar 

  37. Wang, J., Li, G., Yu, J.X., Feng, J.: Entity matching: how similar is similar. Proc. VLDB Endow. 4(10), 622–633 (2011)

    Article  Google Scholar 

  38. Zhang, D., Nie, Y., Wu, S., Shen, Y., Tan, K.L.: Multi-context attention for entity matching. In: Proceedings of The Web Conference 2020, pp. 2634–2640 (2020)

    Google Scholar 

  39. Zheng, Y., Zhang, R., Huang, M., Mao, X.: A pre-training based personalized dialogue generation model with persona-sparse data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9693–9700 (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62172082, 62072084, 62072086, U1811261).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derong Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dou, W. et al. (2022). Empowering Transformer with Hybrid Matching Knowledge for Entity Matching. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13247. Springer, Cham. https://doi.org/10.1007/978-3-031-00129-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-00129-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-00128-4

  • Online ISBN: 978-3-031-00129-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics