Applied Intelligence

, Volume 35, Issue 3, pp 359–374

Meta similarity

Article

DOI: 10.1007/s10489-010-0226-3

Cite this article as:
On, BW. & Lee, I. Appl Intell (2011) 35: 359. doi:10.1007/s10489-010-0226-3

Abstract

To see if two given strings are matched, various string similarity metrics have been employed and these string similarities can be categorized into three classes: (a) Edit-distance-based similarities, (b) Token-based similarities, and (c) Hybrid similarities. In essence, since different types of string similarities have different pros and cons in measuring the similarity between two strings, string similarity metrics in each class are likely to work well for particular data sets. Toward this problem, we propose a novel Meta Similarity that both (i) outperforms the existing similarity metrics and (ii) is the least affected by a variety of data sets. Our claim is empirically validated through extensive experimental tests—our proposal shows an improvement to the largest 20% average recall, compared to the best case of the existing similarity metrics and our method is the most stable, showing from 0.95 to 1.0 average recall range in all the data sets.

Keywords

String similarity Machine learning Entity resolution Linear systems 

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Singapore Management UniversitySingaporeSingapore
  2. 2.Troy UniversityTroyUSA

Personalised recommendations