Entity Identification in Deep Web

Jiang, Fangjiao; Meng, Yuehong; Wei, Mingsheng; Li, Quanbin

doi:10.1007/978-3-642-54370-8_13

Fangjiao Jiang²⁰,
Yuehong Meng²⁰,
Mingsheng Wei²⁰ &
…
Quanbin Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8182))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1491 Accesses

Abstract

The Web has been rapidly deepened with the prevalence of databases online. From different Web sources, the records usually use different representations to refer to the same real world entity. Therefore, there is a high demand for identifying these entities from multiple Web databases in many Web application, e.g., comparison shopping. In this paper, we propose an effective entity identification approach which is based on a similarity function. Moreover, we develop query-based dynamic weight techniques in our approach. Experimental results show that our approach can effectively discover the records representing the same entity in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Article Google Scholar
Winkler, W.E.: Overview of record linkage and current research directions. Technical Report, Statistical Research Division, U.S. Bureau of the Census (2006)
Google Scholar
Batini, C., Scannapieco, M.: Data quality: Concepts, methodologies and techniques. Springer (2006)
Google Scholar
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer (2001)
Google Scholar
Cochinwala, M., Kurien, V., Lalk, G.: Efficient data reconciliation. Information Sciences 137, 1–15 (2001)
Article MATH Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1401–1408 (2002)
Google Scholar
Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 269–278. Springer, Edmonton (2002)
Google Scholar
Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Information Systems 26(8), 607–633 (2001)
Article MATH Google Scholar
Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 350–359. Springer, Edmonton (2002)
Google Scholar
Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: The 30th International Conference on Very Large Databases, pp. 636–647. Springer, Toronto (2004)
Google Scholar
Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: The 30th International Conference on Very Large Databases, pp. 1078–1086. Springer, Toronto (2004)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Monge, A.E., Elkan, C.P.: The field matching problem: Algorithms and applications. In: The Second International Conference on Knowledge Discovery and Data Mining, pp. 267–270. AAAI Press, Portland (1996)
Google Scholar
Agrawal, R., Srikant, R.: Searching with numbers. In: The 11th International World Wide Web Conference, pp. 420–431. ACM, Honolulu (2002)
Google Scholar
Jiang, F., Meng, W., Meng, X.: Selectivity estimation for exclusive query translation in deep web data integration. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 595–600. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Jiangsu Normal University, Jiangsu, China
Fangjiao Jiang, Yuehong Meng, Mingsheng Wei & Quanbin Li

Authors

Fangjiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehong Meng
View author publications
You can also search for this author in PubMed Google Scholar
Mingsheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Quanbin Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vrije University of Amsterdam, De Boelelaan 1081, 1081, Amsterdam, The Netherlands
Zhisheng Huang
Faculty of Information and Communication Technologies, Swinburne University of Technology, PO Box 218, 3122, Melbourne, VIC, Australia
Chengfei Liu
College of Engineering and Science, Victoria University, PO Box 14428, 8001, Melbourne, VIC, Australia
Jing He
Centre for Applied Informatics, Victoria University, PO Box 14428, 8001, Melbourne, VIC, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, F., Meng, Y., Wei, M., Li, Q. (2014). Entity Identification in Deep Web. In: Huang, Z., Liu, C., He, J., Huang, G. (eds) Web Information Systems Engineering – WISE 2013 Workshops. WISE 2013. Lecture Notes in Computer Science, vol 8182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54370-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-54370-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54369-2
Online ISBN: 978-3-642-54370-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics