Advertisement

Entity Resolution on Uncertain Relations

  • Huabin Feng
  • Hongzhi Wang
  • Jianzhong Li
  • Hong Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7923)

Abstract

In many different application areas entity resolution places a pivotal role. Because of the existence of uncertain in many applications such as information extraction and online product category, entity resolution should be applied on uncertain data. The characteristic of uncertainty makes it impossible to apply traditional techniques directly. In this paper, we propose techniques to perform entity resolution on uncertain data. Firstly, we propose a new probabilistic similarity metric for uncertain tuples. Secondly, based on the metric, we propose novel pruning techniques to efficiently join pairwise uncertain tuples without enumerating all possible worlds. Finally, we propose a density-based clustering algorithm to combine the results of pairwise similarity join. With extensive experimental evaluation on synthetic and real-world data sets, we demonstrate the benefits and features of our approaches.

Keywords

Probability Threshold Uncertain Data Jaccard Similarity Pruning Technique Entity Resolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB 2006 (2006)Google Scholar
  2. 2.
    Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW 2007 (2007)Google Scholar
  3. 3.
    Chandel, A., Hassanzadeh, O., Koudas, N., Sadoghi, M., Srivastava, D.: Benchmarking larative approximate selection predicates. In: SIGMOD 2007 (2007)Google Scholar
  4. 4.
    Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE 2006 (2006)Google Scholar
  5. 5.
    Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE 2008 (2008)Google Scholar
  6. 6.
    Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)CrossRefGoogle Scholar
  7. 7.
    Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: SIGMOD 2004 (2004)Google Scholar
  8. 8.
    Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k Set similarity Joins. In: ICDE 2009 (2009)Google Scholar
  9. 9.
    Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW 2008 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Huabin Feng
    • 1
  • Hongzhi Wang
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Harbin Institute of TechnologyChina

Personalised recommendations