Advertisement

Probabilistic Entity Linkage for Heterogeneous Information Spaces

  • Ekaterini Ioannou
  • Claudia Niederée
  • Wolfgang Nejdl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5074)

Abstract

Heterogeneous information spaces are typically created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real-word entity (for example an artist, a conference, an organization). In this paper we propose a new probabilistic Entity Linkage algorithm for identifying and linking data that refer to the same real-world entity.

Our approach focuses on managing entity linkage information in heterogeneous information spaces using probabilistic methods. We use a Bayesian network to model evidences which support the possible object matches along with the interdependencies between them. This enables us to flexibly update the network when new information becomes available, and to cope with the different requirements imposed by applications build on top of information spaces.

Keywords

entity linkage data integration metadata management 

References

  1. 1.
    Aleman-Meza, B., Nagarajan, M., Ramakrishnan, C., Ding, L., Kolari, P., Sheth, A.P., Arpinar, I.B., Joshi, A., Finin, T.: Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In: WWW 2006 (2006)Google Scholar
  2. 2.
    Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: VLDB (2002)Google Scholar
  3. 3.
    Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: WWW 2005 (2005)Google Scholar
  4. 4.
    Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widomr, J., Jonas, J.: Swoosh: A generic approach to entity resolution. Technical report, Stanford InfoLab (2006)Google Scholar
  5. 5.
    Bhattacharya, I., Getoor, L.: Deduplication and group detection using links. In: Workshop on Link Analysis and Group Detection, ACM SIGKDD 2004 (2004)Google Scholar
  6. 6.
    Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD (2004)Google Scholar
  7. 7.
    Bouquet, P., Stoermer, H., Mancioppi, M., Giacomuzzi, D.: OkkaM: Towards a Solution to the “Identity Crisis” on the Semantic Web. In: Italian Semantic Web Workshop, SWAP (2006)Google Scholar
  8. 8.
    Brunkhorst, I., Chirita, P.A., Costache, S., Julien Gaugaz, E.I., Iofciu, T., Minack, E., Nejdl, W., Paiu, R.: The beagle + +  toolbox: Towards an extendable desktop search architecture. In: Semantic Desktop Workshop, ISWC (2006)Google Scholar
  9. 9.
    Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: Workshop on Inf. Integration on the Web (2003)Google Scholar
  10. 10.
    Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference (2005)Google Scholar
  11. 11.
    Guha, R.V., McCool, R.: Tap: a semantic web platform. Computer Networks (2003)Google Scholar
  12. 12.
    Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov. (1998)Google Scholar
  13. 13.
    Jensen, F.V.: Bayesian Networks and Decision Graphs. Springer, New York (2001)zbMATHGoogle Scholar
  14. 14.
    Kalashnikov, D.V., Mehrotra, S.: Domain-independent data cleaning via analysis of entity-relationship graph. ACM Trans. Database Syst. (2006)Google Scholar
  15. 15.
    Kalashnikov, D.V., Mehrotra, S., Chen, Z.: Exploiting relationships for domain-independent data cleaning. In: SDM (2005)Google Scholar
  16. 16.
    Li, J.-Z., Tang, J., Zhang, J., Luo, Q., Liu, Y., Hong, M.: Eos: expertise oriented search using social networks. In: WWW (2007)Google Scholar
  17. 17.
    Parag, Domingos, P.: Multi-relational record linkage. In: MRDM (2004)Google Scholar
  18. 18.
    Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)Google Scholar
  19. 19.
    Weis, M., Manolescu, I.: Declarative xml data cleaning with xclean. In: CAiSE (2007)Google Scholar
  20. 20.
    Winkler, W.E.: The state of record linkage and current research problems. Technical report (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ekaterini Ioannou
    • 1
  • Claudia Niederée
    • 1
  • Wolfgang Nejdl
    • 1
  1. 1.L3S Research Center/Leibniz Universität HannoverHannoverGermany

Personalised recommendations