Skip to main content

TMM: Entity Resolution for Deep Web

  • Conference paper
  • First Online:
Recent Developments in Mechatronics and Intelligent Robotics (ICMIR 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 691))

Included in the following conference series:

  • 1239 Accesses

Abstract

Entity resolution is a technique to find different records that belong to the same entity. In this paper, based on clustering idea, we propose a novel framework—twice-merging model (TMM). We introduce self-matching and discuss some novel approaches to automatic blocking, matching evaluation, self-matching detection, similarity calculation, as well as cluster generating and merging. Experimental results show that our method can effectively reduce matching space, improve matching accuracy and system efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanna, K., Erhard, R.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)

    Article  Google Scholar 

  2. Han, X.P., Sun, L., Zhao, J.: Collective entity linking in Web text: a graph-based method. In: Proceedings of the 34th Annual ACM SIGIR Conference, pp. 765–774 (2011)

    Google Scholar 

  3. Vibhor, R., Nilesh, D., Minos, G.: Large-scale collective entity matching. Proc. VLDB 4(4), 208–218 (2011)

    Article  Google Scholar 

  4. Wang, Z.C, Li, J.Z, Wang, Z.G., et al.: Cross-lingual knowledge linking across Wiki knowledge bases. In: Proceedings of the 21st International Word Wide Web Conference, pp. 459–468 (2012)

    Google Scholar 

  5. Fan, J., Lu, M.Y., Ooi, B.C., et al.: A hybrid machine-crowdsourcing system for matching Web tables. In: Proceedings of the 30th International Conference on Data engineering (ICDE), pp. 976–987 (2014)

    Google Scholar 

  6. Cui, X.J., Xiao, H.Y., Ding, L.X.: Distance-based adaptive record matching for Web database. J. Wuhan Univ. 58(1), 89–94 (2012)

    Google Scholar 

  7. Liu, W., Meng, X.F.: A holistic solution for duplicate entity identification in deep Web data integration. In: Proceedings of the 6th International Conference on Semantics, Knowledge and Grids (SKG), pp. 267–274 (2010)

    Google Scholar 

  8. Xu, H.Y., Dang, X.W., Feng, Y., et al.: Method of Deep Web entities identification based on BP network. J. Comput. Appl. 33(3), 776–779 (2013)

    Google Scholar 

  9. Liu, W., Meng, X.F., Yang, J.W., et al.: Duplicate identification in Deep Web data integration. In: Proceedings of the 11th International Conference on Web-Age Information Management, pp. 5–17 (2010)

    Google Scholar 

  10. Li, Y.K., Wang, H.Z., Gao, H., et al.: Efficient entity resolution on XML data based on entity-describe-attribute. Chin. J. Comput. 34(11), 2131–2141 (2011)

    Article  Google Scholar 

  11. Vasilis, E., Geroge, P., Geroge, P., et al.: Parallel meta-blocking: realizing scalable entity resolution over large, heterogeneous data. In: Proceedings of the 4th International Conference on Big Data, pp. 411–420 (2015)

    Google Scholar 

  12. Chen, L.J., Lin, H.Z.: Pattern matching method for Deep Web interface integration. Comput. Eng. 38(12), 42–44 (2012)

    Google Scholar 

  13. Mccallum, A.: Cora citation matching (2004-2-9) (2004). http://www.cs.umass.edu/~mccallum/data/cora-refs.tar.gz. Accessed 22 Aug 2015

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Lijun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lijun, C. (2018). TMM: Entity Resolution for Deep Web. In: Qiao, F., Patnaik, S., Wang, J. (eds) Recent Developments in Mechatronics and Intelligent Robotics. ICMIR 2017. Advances in Intelligent Systems and Computing, vol 691. Springer, Cham. https://doi.org/10.1007/978-3-319-70990-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70990-1_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70989-5

  • Online ISBN: 978-3-319-70990-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics