Abstract
The Web has been rapidly deepened with the prevalence of databases online. From different Web sources, the records usually use different representations to refer to the same real world entity. Therefore, there is a high demand for identifying these entities from multiple Web databases in many Web application, e.g., comparison shopping. In this paper, we propose an effective entity identification approach which is based on a similarity function. Moreover, we develop query-based dynamic weight techniques in our approach. Experimental results show that our approach can effectively discover the records representing the same entity in the real world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Winkler, W.E.: Overview of record linkage and current research directions. Technical Report, Statistical Research Division, U.S. Bureau of the Census (2006)
Batini, C., Scannapieco, M.: Data quality: Concepts, methodologies and techniques. Springer (2006)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer (2001)
Cochinwala, M., Kurien, V., Lalk, G.: Efficient data reconciliation. Information Sciences 137, 1–15 (2001)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1401–1408 (2002)
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 269–278. Springer, Edmonton (2002)
Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Information Systems 26(8), 607–633 (2001)
Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 350–359. Springer, Edmonton (2002)
Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: The 30th International Conference on Very Large Databases, pp. 636–647. Springer, Toronto (2004)
Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: The 30th International Conference on Very Large Databases, pp. 1078–1086. Springer, Toronto (2004)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Monge, A.E., Elkan, C.P.: The field matching problem: Algorithms and applications. In: The Second International Conference on Knowledge Discovery and Data Mining, pp. 267–270. AAAI Press, Portland (1996)
Agrawal, R., Srikant, R.: Searching with numbers. In: The 11th International World Wide Web Conference, pp. 420–431. ACM, Honolulu (2002)
Jiang, F., Meng, W., Meng, X.: Selectivity estimation for exclusive query translation in deep web data integration. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 595–600. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, F., Meng, Y., Wei, M., Li, Q. (2014). Entity Identification in Deep Web. In: Huang, Z., Liu, C., He, J., Huang, G. (eds) Web Information Systems Engineering – WISE 2013 Workshops. WISE 2013. Lecture Notes in Computer Science, vol 8182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54370-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-54370-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54369-2
Online ISBN: 978-3-642-54370-8
eBook Packages: Computer ScienceComputer Science (R0)