Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8182))

Included in the following conference series:

  • 1491 Accesses

Abstract

The Web has been rapidly deepened with the prevalence of databases online. From different Web sources, the records usually use different representations to refer to the same real world entity. Therefore, there is a high demand for identifying these entities from multiple Web databases in many Web application, e.g., comparison shopping. In this paper, we propose an effective entity identification approach which is based on a similarity function. Moreover, we develop query-based dynamic weight techniques in our approach. Experimental results show that our approach can effectively discover the records representing the same entity in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  2. Winkler, W.E.: Overview of record linkage and current research directions. Technical Report, Statistical Research Division, U.S. Bureau of the Census (2006)

    Google Scholar 

  3. Batini, C., Scannapieco, M.: Data quality: Concepts, methodologies and techniques. Springer (2006)

    Google Scholar 

  4. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)

    Article  Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer (2001)

    Google Scholar 

  6. Cochinwala, M., Kurien, V., Lalk, G.: Efficient data reconciliation. Information Sciences 137, 1–15 (2001)

    Article  MATH  Google Scholar 

  7. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. Advances in Neural Information Processing Systems, 1401–1408 (2002)

    Google Scholar 

  8. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 269–278. Springer, Edmonton (2002)

    Google Scholar 

  9. Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Information Systems 26(8), 607–633 (2001)

    Article  MATH  Google Scholar 

  10. Tejada, S., Knoblock, C.A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: The 8th International Conference on Knowledge Discovery and Data Mining, pp. 350–359. Springer, Edmonton (2002)

    Google Scholar 

  11. Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: The 30th International Conference on Very Large Databases, pp. 636–647. Springer, Toronto (2004)

    Google Scholar 

  12. Koudas, N., Marathe, A., Srivastava, D.: Flexible string matching against large databases in practice. In: The 30th International Conference on Very Large Databases, pp. 1078–1086. Springer, Toronto (2004)

    Google Scholar 

  13. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)

    Article  Google Scholar 

  14. Monge, A.E., Elkan, C.P.: The field matching problem: Algorithms and applications. In: The Second International Conference on Knowledge Discovery and Data Mining, pp. 267–270. AAAI Press, Portland (1996)

    Google Scholar 

  15. Agrawal, R., Srikant, R.: Searching with numbers. In: The 11th International World Wide Web Conference, pp. 420–431. ACM, Honolulu (2002)

    Google Scholar 

  16. Jiang, F., Meng, W., Meng, X.: Selectivity estimation for exclusive query translation in deep web data integration. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 595–600. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, F., Meng, Y., Wei, M., Li, Q. (2014). Entity Identification in Deep Web. In: Huang, Z., Liu, C., He, J., Huang, G. (eds) Web Information Systems Engineering – WISE 2013 Workshops. WISE 2013. Lecture Notes in Computer Science, vol 8182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54370-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54370-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54369-2

  • Online ISBN: 978-3-642-54370-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics