Evaluation of Matching Quality and Complexity
There are various ways to evaluate the outcomes of a data matching project. The quality of the matched data is usually the most important, which can be measured as how many of the matched record pairs correspond to true matches (i.e. refer to the same entity) and how many of the known true matching record pairs have been detected and classified as matches. Also of importance is the performance of a data matching system with regard to how much time and computing resources it takes to conduct a certain data matching project.This chapter covers these issues and presents measures that allow assessing both data matching quality (Sect. 7.2) and complexity (Sect. 7.3). Important issues with regard to measuring matching quality and complexity are discussed, and pitfalls to avoid are highlighted. The second part of this chapter covers further topics related to evaluating data matching. Section 7.4 discusses the manual clerical review process which has been, and still is, required in many traditional data matching systems. A major challenge for researchers working in data matching is how to acquire (real-world) test data that allow experimental evaluations of new data matching algorithms and techniques. This topic is covered in Sect. 7.5. The alternative of using synthetically generated artificial data that have characteristics similar to real-world data is then discussed in Sect. 7.6.