A Graph Matching Method for Historical Census Household Linkage

  • Zhichun Fu
  • Peter Christen
  • Jun Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8443)

Abstract

Linking historical census data across time is a challenging task due to various reasons, including data quality, limited individual information, and changes to households over time. Although most census data linking methods link records that correspond to individual household members, recent advances show that linking households as a whole provide more accurate results and less multiple household links. In this paper, we introduce a graph-based method to link households, which takes the structural relationship between household members into consideration. Based on individual record linking results, our method builds a graph for each household, so that the matches are determined by both attribute-level and record-relationship similarity. Our experimental results on both synthetic and real historical census data have validated the effectiveness of this method. The proposed method achieves an F-measure of 0.937 on data extracted from real UK census datasets, outperforming all alternative methods being compared.

Keywords

graph matching record linkage household linkage historical census data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  2. 2.
    Bloothooft, G.: Multi-source family reconstruction. History and Computing 7(2), 90–103 (1995)CrossRefGoogle Scholar
  3. 3.
    Caetano, T., McAuley, J., Cheng, L., Le, Q.V., Smola, A.: Learning graph matching. IEEE TPAMI 31(6), 1048–1058 (2009)CrossRefGoogle Scholar
  4. 4.
    Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: ACM KDD, Las Vegas, pp. 1065–1068 (2008)Google Scholar
  5. 5.
    Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)Google Scholar
  6. 6.
    Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)CrossRefMATHGoogle Scholar
  7. 7.
    Domingos, P.: Multi-relational record linkage. In: KDD Workshop, pp. 31–48 (2004)Google Scholar
  8. 8.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE TKDE 19(1), 1–16 (2007)Google Scholar
  9. 9.
    Fu, Z., Christen, P., Boot, M.: Automatic cleaning and linking of historical census data using household information. In: IEEE ICDM Workshop, pp. 413–420 (2011)Google Scholar
  10. 10.
    Fu, Z., Zhou, J., Christen, P., Boot, M.: Multiple instance learning for group record linkage. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 171–182. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Fure, E.: Interactive record linkage: The cumulative construction of life courses. Demographic Research 3, 11 (2000)Google Scholar
  12. 12.
    Hall, R., Fienberg, S.: Valid statistical inference on automatically matched files. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 131–142. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley (2013)Google Scholar
  14. 14.
    Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5(1), 32–38 (1957)CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Self-tuning in graph-based reference disambiguation. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 325–336. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: IEEE ICDE, Istanbul, Turkey, pp. 496–505 (2007)Google Scholar
  17. 17.
    Quass, D., Starkey, P.: Record linkage for genealogical databases. In: ACM KDD Workshop, Washington, DC, pp. 40–42 (2003)Google Scholar
  18. 18.
    Ravikumar, P., Cohen, W.W.: A hierarchical graphical model for record linkage. In: UAI, pp. 454–461 (2004)Google Scholar
  19. 19.
    Ruggles, S.: Linking historical censuses: a new approach. History and Computing 14(1+2), 213–224 (2006)Google Scholar
  20. 20.
    Sadinle, M., Fienberg, S.: A generalized Fellegi-Sunter framework for multiple record linkage with application to homicide record systems. Journal of the American Statistical Association 108(502), 385–397 (2013)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Zager, L., Verghese, G.: Graph similarity scoring and matching. Applied Mathematics Letters 21(1), 86–94 (2008)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zhichun Fu
    • 1
  • Peter Christen
    • 1
  • Jun Zhou
    • 2
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia
  2. 2.School of Information and Communication TechnologyGriffith UniversityNathanAustralia

Personalised recommendations