Record Linkage Using Graph Consistency
This paper provides a method for automated record linkage in the historical domain based on collective entity resolution. Multiple records are considered for linkage simultaneously, using plausible record sequences as a substitute for pair-wise record similarity measures such as string edit distance. The method is applied to the problem of family reconstruction from historical archives. A benchmark evaluation shows that the approach provides a computationally efficient way to produce family reconstructions which are useful in practise. Further improvements in linkage accuracy are expected by addressing data issues and linkage assumption violations.
KeywordsEdit Distance Record Linkage Partial Family Candidate Match Multiple Instance Learning
Unable to display preview. Download preview PDF.
- 2.Ivie, S., Pixton, B., Giraud-Carrier, C.: Metric-based data mining model for genealogical record linkage. In: Proceedings of the IEEE International Conference on Information Reuse and Integration, pp. 538–543. IEEE (2007)Google Scholar
- 3.Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data 1(1), Article 5 (2007)Google Scholar
- 4.Zampelli, S., Deville, Y., Dupont, P.: Declarative approximate graph matching using a constraint approach. In: Proceedings of the Second International Workshop on Constraint Propagation and Implementation, pp. 109–124 (2005)Google Scholar
- 5.Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: Proceedings of the 21st IEEE International Conference on Data Engineering, pp. 57–68. IEEE (2005)Google Scholar
- 8.Goiser, K., Christen, P.: Towards automated record linkage. In: Proceedings of the Fifth Australasian Conference on Data Mining and Analytics, pp. 23–31. Australian Computer Society, Inc. (2006)Google Scholar
- 9.Winkler, W.: Overview of record linkage and current research directions. Technical report, U.S. Census Bureau (2006)Google Scholar
- 10.Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)Google Scholar