Advertisement

Record Linkage Using Graph Consistency

  • Marijn Schraagen
  • Walter Kosters
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8556)

Abstract

This paper provides a method for automated record linkage in the historical domain based on collective entity resolution. Multiple records are considered for linkage simultaneously, using plausible record sequences as a substitute for pair-wise record similarity measures such as string edit distance. The method is applied to the problem of family reconstruction from historical archives. A benchmark evaluation shows that the approach provides a computationally efficient way to produce family reconstructions which are useful in practise. Further improvements in linkage accuracy are expected by addressing data issues and linkage assumption violations.

Keywords

Edit Distance Record Linkage Partial Family Candidate Match Multiple Instance Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fu, Z., Zhou, J., Christen, P., Boot, M.: Multiple instance learning for group record linkage. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 171–182. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Ivie, S., Pixton, B., Giraud-Carrier, C.: Metric-based data mining model for genealogical record linkage. In: Proceedings of the IEEE International Conference on Information Reuse and Integration, pp. 538–543. IEEE (2007)Google Scholar
  3. 3.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data 1(1), Article 5 (2007)Google Scholar
  4. 4.
    Zampelli, S., Deville, Y., Dupont, P.: Declarative approximate graph matching using a constraint approach. In: Proceedings of the Second International Workshop on Constraint Propagation and Implementation, pp. 109–124 (2005)Google Scholar
  5. 5.
    Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: Proceedings of the 21st IEEE International Conference on Data Engineering, pp. 57–68. IEEE (2005)Google Scholar
  6. 6.
    Winchester, I.: The linkage of historical records by man and computer: Techniques and problems. The Journal of Interdisciplinary History 1(1), 107–124 (1970)CrossRefGoogle Scholar
  7. 7.
    Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)CrossRefGoogle Scholar
  8. 8.
    Goiser, K., Christen, P.: Towards automated record linkage. In: Proceedings of the Fifth Australasian Conference on Data Mining and Analytics, pp. 23–31. Australian Computer Society, Inc. (2006)Google Scholar
  9. 9.
    Winkler, W.: Overview of record linkage and current research directions. Technical report, U.S. Census Bureau (2006)Google Scholar
  10. 10.
    Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)Google Scholar
  11. 11.
    Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. Intelligent Systems 18(5), 16–23 (2003)CrossRefGoogle Scholar
  12. 12.
    Robusto, C.: The cosine-haversine formula. The American Mathematical Monthly 64(1), 38–40 (1957)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marijn Schraagen
    • 1
  • Walter Kosters
    • 1
  1. 1.Leiden Institute of Advanced Computer ScienceLeiden UniversityThe Netherlands

Personalised recommendations