Data Matching pp 101-127 | Cite as

Field and Record Comparison

  • Peter Christen
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

At the heart of the data matching process lies the detailed comparison of records with each other. These comparisons are usually performed on several attributes (or fields) of records, leading to a vector of numerical similarity values for each compared record pair. These similarity values are used to decide whether the two records in a pair are a match (i.e. correspond to the same entity) or a non-match (i.e. correspond to two different entities). Even after the data to be matched have been pre-processed (cleaned, standardised and segmented), it is likely that attribute values from different input databases do include variations and errors, and therefore some kind of approximate or ‘fuzzy’ comparison function is required to calculate the similarities between attribute values. Most attributes that are used in data matching contain values in the form of strings (such as names and addresses). In this chapter, the most commonly used approximate string comparison functions are presented in detail, and an overview of several more recently developed such functions is provided. An experimental comparison of the presented approximate string comparison functions on a data set that contains real name values shows the differences in the calculated similarity values. Furthermore, comparison functions for numerical data, as well as dates, ages, times, geographic locations, and more complex types of data are also discussed in this chapter.

Keywords

Comparison Function Edit Distance Optical Character Recognition Input String Dynamic Programming Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Christen
    • 1
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations