Data Matching pp 163-184 | Cite as

Evaluation of Matching Quality and Complexity

  • Peter Christen
Part of the Data-Centric Systems and Applications book series (DCSA)


There are various ways to evaluate the outcomes of a data matching project. The quality of the matched data is usually the most important, which can be measured as how many of the matched record pairs correspond to true matches (i.e. refer to the same entity) and how many of the known true matching record pairs have been detected and classified as matches. Also of importance is the performance of a data matching system with regard to how much time and computing resources it takes to conduct a certain data matching project.This chapter covers these issues and presents measures that allow assessing both data matching quality (Sect. 7.2) and complexity (Sect. 7.3). Important issues with regard to measuring matching quality and complexity are discussed, and pitfalls to avoid are highlighted. The second part of this chapter covers further topics related to evaluating data matching. Section 7.4 discusses the manual clerical review process which has been, and still is, required in many traditional data matching systems. A major challenge for researchers working in data matching is how to acquire (real-world) test data that allow experimental evaluations of new data matching algorithms and techniques. This topic is covered in Sect. 7.5. The alternative of using synthetically generated artificial data that have characteristics similar to real-world data is then discussed in Sect. 7.6.


Receiver Operating Characteristic Curve True Match Indexing Technique Data Match Potential Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Peter Christen
    • 1
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations