A Graph Matching Method for Historical Census Household Linkage
Linking historical census data across time is a challenging task due to various reasons, including data quality, limited individual information, and changes to households over time. Although most census data linking methods link records that correspond to individual household members, recent advances show that linking households as a whole provide more accurate results and less multiple household links. In this paper, we introduce a graph-based method to link households, which takes the structural relationship between household members into consideration. Based on individual record linking results, our method builds a graph for each household, so that the matches are determined by both attribute-level and record-relationship similarity. Our experimental results on both synthetic and real historical census data have validated the effectiveness of this method. The proposed method achieves an F-measure of 0.937 on data extracted from real UK census datasets, outperforming all alternative methods being compared.
Keywordsgraph matching record linkage household linkage historical census data
Unable to display preview. Download preview PDF.
- 1.Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
- 4.Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: ACM KDD, Las Vegas, pp. 1065–1068 (2008)Google Scholar
- 5.Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)Google Scholar
- 7.Domingos, P.: Multi-relational record linkage. In: KDD Workshop, pp. 31–48 (2004)Google Scholar
- 8.Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE TKDE 19(1), 1–16 (2007)Google Scholar
- 9.Fu, Z., Christen, P., Boot, M.: Automatic cleaning and linking of historical census data using household information. In: IEEE ICDM Workshop, pp. 413–420 (2011)Google Scholar
- 11.Fure, E.: Interactive record linkage: The cumulative construction of life courses. Demographic Research 3, 11 (2000)Google Scholar
- 13.Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley (2013)Google Scholar
- 16.On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: IEEE ICDE, Istanbul, Turkey, pp. 496–505 (2007)Google Scholar
- 17.Quass, D., Starkey, P.: Record linkage for genealogical databases. In: ACM KDD Workshop, Washington, DC, pp. 40–42 (2003)Google Scholar
- 18.Ravikumar, P., Cohen, W.W.: A hierarchical graphical model for record linkage. In: UAI, pp. 454–461 (2004)Google Scholar
- 19.Ruggles, S.: Linking historical censuses: a new approach. History and Computing 14(1+2), 213–224 (2006)Google Scholar