Abstract
Entity resolution is a technique to find different records that belong to the same entity. In this paper, based on clustering idea, we propose a novel framework—twice-merging model (TMM). We introduce self-matching and discuss some novel approaches to automatic blocking, matching evaluation, self-matching detection, similarity calculation, as well as cluster generating and merging. Experimental results show that our method can effectively reduce matching space, improve matching accuracy and system efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hanna, K., Erhard, R.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)
Han, X.P., Sun, L., Zhao, J.: Collective entity linking in Web text: a graph-based method. In: Proceedings of the 34th Annual ACM SIGIR Conference, pp. 765–774 (2011)
Vibhor, R., Nilesh, D., Minos, G.: Large-scale collective entity matching. Proc. VLDB 4(4), 208–218 (2011)
Wang, Z.C, Li, J.Z, Wang, Z.G., et al.: Cross-lingual knowledge linking across Wiki knowledge bases. In: Proceedings of the 21st International Word Wide Web Conference, pp. 459–468 (2012)
Fan, J., Lu, M.Y., Ooi, B.C., et al.: A hybrid machine-crowdsourcing system for matching Web tables. In: Proceedings of the 30th International Conference on Data engineering (ICDE), pp. 976–987 (2014)
Cui, X.J., Xiao, H.Y., Ding, L.X.: Distance-based adaptive record matching for Web database. J. Wuhan Univ. 58(1), 89–94 (2012)
Liu, W., Meng, X.F.: A holistic solution for duplicate entity identification in deep Web data integration. In: Proceedings of the 6th International Conference on Semantics, Knowledge and Grids (SKG), pp. 267–274 (2010)
Xu, H.Y., Dang, X.W., Feng, Y., et al.: Method of Deep Web entities identification based on BP network. J. Comput. Appl. 33(3), 776–779 (2013)
Liu, W., Meng, X.F., Yang, J.W., et al.: Duplicate identification in Deep Web data integration. In: Proceedings of the 11th International Conference on Web-Age Information Management, pp. 5–17 (2010)
Li, Y.K., Wang, H.Z., Gao, H., et al.: Efficient entity resolution on XML data based on entity-describe-attribute. Chin. J. Comput. 34(11), 2131–2141 (2011)
Vasilis, E., Geroge, P., Geroge, P., et al.: Parallel meta-blocking: realizing scalable entity resolution over large, heterogeneous data. In: Proceedings of the 4th International Conference on Big Data, pp. 411–420 (2015)
Chen, L.J., Lin, H.Z.: Pattern matching method for Deep Web interface integration. Comput. Eng. 38(12), 42–44 (2012)
Mccallum, A.: Cora citation matching (2004-2-9) (2004). http://www.cs.umass.edu/~mccallum/data/cora-refs.tar.gz. Accessed 22 Aug 2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Lijun, C. (2018). TMM: Entity Resolution for Deep Web. In: Qiao, F., Patnaik, S., Wang, J. (eds) Recent Developments in Mechatronics and Intelligent Robotics. ICMIR 2017. Advances in Intelligent Systems and Computing, vol 691. Springer, Cham. https://doi.org/10.1007/978-3-319-70990-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-70990-1_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70989-5
Online ISBN: 978-3-319-70990-1
eBook Packages: EngineeringEngineering (R0)