Journal of Intelligent Information Systems

, Volume 43, Issue 1, pp 101–127

Discriminative and deterministic approaches towards entity resolution

  • Byung-Won On
  • Ingyu Lee
  • Gyu Sang Choi
  • Ho-Sik Park
Article

DOI: 10.1007/s10844-014-0308-5

Cite this article as:
On, BW., Lee, I., Choi, G.S. et al. J Intell Inf Syst (2014) 43: 101. doi:10.1007/s10844-014-0308-5
  • 250 Downloads

Abstract

To address the entity resolution problem, existing studies usually consist of two-steps. Given two lists of records, in the first step a small set of duplicate records (a candidate set) are selected based on index structures and algorithms for efficient entity resolution. Then, a given similarity function is applied to quantify the similarity of records in the candidate set. However, for real applications, it is a non-trivial task to select appropriate indexing techniques and similarity functions. In this paper, we tackle the problem of indexing and similarity function identification using both discriminative and deterministic approaches that select the best of indexing and similarity measures. According to our experimental results, our proposed solution considering both discriminative and deterministic approaches shows more than a 90 % average accuracy within hundreds of seconds.

Keywords

Entity resolution Approximate string matching Similarities Support vector machines Blocking techniques 

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Byung-Won On
    • 1
  • Ingyu Lee
    • 1
  • Gyu Sang Choi
    • 2
  • Ho-Sik Park
    • 3
  1. 1.Advanced Institutes of Convergence TechnologySeoul National UniversityGyeonggi-doKorea
  2. 2.Department of Information and Communication EngineeringYeungnam UniversityGyeongsangbukKorea
  3. 3.Division of Information and Computer EngineeringAjou UniversityGyeonggi-doKorea

Personalised recommendations