Knowledge-Based Entity Resolution with Contextual Information Defined over a Monoid

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9344)

Abstract

Entity resolution (aka record linkage) addresses the problem to decide whether two entity representations in a database or stream correspond to the same real-world object. Knowledge-based entity resolution is grounded in knowledge patterns, which combine rules defined by Horn clauses with conditions prescribing when the rule is applicable, and conditions specifying when the application of the rule is not permitted. So far, these positive and negative conditions are expressed as bindings of the variables appearing in the Horn clause. In this paper the condition part of a knowledge pattern is generalised to a context, which is still defined by a positive and a negative part, but for both equations involving operators are permitted. The paper concentrates on conditions over a monoid for the constraints in a context. With this generalisation standard properties of knowledge patterns such as minimality, containment and optimality are investigated, which altogether minimise redundancy and thus optimise the inference of equivalences between entities.

References

  1. 1.
    Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. Proceed. SIGMOD 2010, 783–794 (2010)Google Scholar
  2. 2.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. TKDD 1(1), 5 (2007)CrossRefGoogle Scholar
  3. 3.
    Christen, P.: Data Matching. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Cohen, W.W.: Data integration using similarity joins and a word-based information representation language. ACM TOIS 18(3), 288–321 (2000)CrossRefGoogle Scholar
  5. 5.
    Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. Proceed. SIGMOD 2005, 85–96 (2005)Google Scholar
  6. 6.
    Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Statist. Assoc. 64(328), 1183–1210 (1969)CrossRefGoogle Scholar
  7. 7.
    Jaffar, J.: Minimal and complete word unification. J. ACM 37, 47–85 (1990)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Makanin, G.: The problem of solvability of equations in a free semigroup. Math. USSR Sb. 32, 129–198 (1977)CrossRefMATHGoogle Scholar
  9. 9.
    Newcombe, H., Kennedy, J.: Record linkage: making maximum use of the discriminating power of identifying information. Commun. ACM 5(11), 563–566 (1962)CrossRefGoogle Scholar
  10. 10.
    Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: KDD, pp. 269–278 (2002)Google Scholar
  11. 11.
    Schewe, K.D., Wang, Q.: Knowledge-aware identity services. Knowl. Inf. Syst. 36(2), 335–357 (2013)CrossRefGoogle Scholar
  12. 12.
    Schewe, K.D., Wang, Q.: A theoretical framework for knowledge-based entity resolution. Theor. Comput. Sci. 549, 101–126 (2014)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Inf. Syst. 26(8), 607–633 (2001)CrossRefMATHGoogle Scholar
  14. 14.
    Verykios, V., Moustakides, G., Elfeky, M.: A bayesian decision model for cost optimal record matching. VLDB J. 12(1), 28–40 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Software Competence Center HagenbergHagenbergAustria
  2. 2.The Australian National UniversityCanberraAustralia
  3. 3.Johannes-Kepler-Universität LinzLinzAustria

Personalised recommendations