Data Matching pp 129-162 | Cite as


  • Peter Christen
Part of the Data-Centric Systems and Applications book series (DCSA)


The objective of the classification step in data matching and deduplication is to decide, based on the detailed attribute (or field) comparisons conducted in the comparison step, if a pair or group of records corresponds to a match or a non-match. Matching records are assumed to refer to the same real-world entity, while non-matching records are assumed to refer to different entities. A variety of classification techniques have been developed for data matching over the past four decades, starting from simple threshold-based approaches that classify each candidate record pair individually, to sophisticated ‘collective’ classifiers that aim to generate an overall optimal classification of all candidate record pairs, taking constraints such as one-to-one match restrictions into account. This chapter describes the major classification techniques for data matching and deduplication, and discusses issues that are relevant to the classification process such as matching restrictions and merging of matched records.


Support Vector Machine Classification Technique True Match Data Match Potential Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Christen
    • 1
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations