Tracking people over time in 19th century Canada for longitudinal analysis
- 566 Downloads
Linking multiple databases to create longitudinal data is an important research problem with multiple applications. Longitudinal data allows analysts to perform studies that would be unfeasible otherwise. We have linked historical census databases to create longitudinal data that allow tracking people over time. These longitudinal data have already been used by social scientists and historians to investigate historical trends and to address questions about society, history and economy, and this comparative, systematic research would not be possible without the linked data. The goal of the linking is to identify the same person in multiple census collections. Data imprecision in historical census data and the lack of unique personal identifiers make this task a challenging one. In this paper we design and employ a record linkage system that incorporates a supervised learning module for classifying pairs of records as matches and non-matches. We show that our system performs large scale linkage producing high quality links and generating sufficient longitudinal data to allow meaningful social science studies. We demonstrate the impact of the longitudinal data through a study of the economic changes in 19th century Canada.
KeywordsRecord linkage Classification Historical census
The authors are grateful for financial support from the Canadian Foundation for Innovation, Ontario Ministry of Research and Innovation, Social Sciences and Humanities Research Council, Google and the University of Guelph. We would also like to thank our genealogical collaborators, the Ontario Genealogical Society, Ontario GenWeb and Family Search. Detailed comments from the reviewers and editors of this journal have helped us substantially to improve our work.
- Antonie, L., Baskerville, P., Inwood, K., & Ross, J. A. (2014, forthcoming). Change amid continuity in Canadian work patterns during the 1870s. In Lives in transition: longitudinal perspectives from historical sources. Google Scholar
- Baskerville, P.: (2014, forthcoming). Wilson Benson revisited: movement and persistence in rural Perth County, Ontario, 1871–1881. In Lives in transition: longitudinal perspectives from historical sources. Google Scholar
- Baskerville, P. & Inwood, K. (Eds.) (2014, forthcoming). Lives in transition: longitudinal perspectives from historical sources. Kingston and Montreal: McGill-Queen’s University Press. Google Scholar
- Bilgic, M., Licamele, L., Getoor, L., & Shneiderman, B. (2006). D-dupe: an interactive tool for entity resolution in social networks. In Visual analytics science and technology (VAST). Baltimore. Google Scholar
- Bourbeau, R., Légaré, J., & Édmond, V. (1997). New birth cohort life tables for Canada and Quebec, 1801–1991. Google Scholar
- Chang, C. C., & Lin, C. J. (2001). Libsvm: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- Cranfield, J., & Inwood, K. (2014, forthcoming). Genes, class or culture? French–English height differences in Canada. In Lives in transition: longitudinal perspectives from historical sources. Google Scholar
- Darroch, G. (2014, forthcoming). Lives in motion: revisiting the ‘agricultural ladder’ in 1860s Ontario, a study of linked microdata. In Lives in transition: longitudinal perspectives from historical sources. Google Scholar
- Drummond, I. (1987). Progress without planning: the economic history of Ontario from confederation to the Second World War. Toronto: University of Toronto Press. Google Scholar
- Fryxell, A., Inwood, K., & van Tassel, A. (2014, forthcoming). Aboriginal and mixed race men in the Canadian expeditionary force 1914–1918. In Lives in transition: longitudinal perspectives from historical sources. Google Scholar
- Gagan, D. (1982). Hopeful travellers families, land, and social change in Mid-Victorian Peel County, Canada West. Toronto: University of Toronto Press. Google Scholar
- Green, A., & Urquhart, M. (1987). New estimates of output growth in Canada: measurement and interpretation. In Perspectives on Canadian economic history (pp. 182–199). Google Scholar
- Kealey, G. (1980). Toronto workers respond to industrial capitalism (pp. 1867–1892). Toronto: University of Toronto Press. Google Scholar
- Newcombe, H. B. (1988). Handbook of record linkage: methods for health and statistical studies, administration, and business. New York: Oxford University Press Google Scholar
- Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal. Google Scholar
- Rahm, E., & Do, H. H. (2000). Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin, 23, 2000. Google Scholar
- Urquhart, M. C. (1986). New estimates of gross national product, Canada, 1870–1926: some implications for Canadian development. In Long term factors in American economic growth (pp. 9–94). Chicago: University of Chicago Press. Google Scholar
- Winkler, W. E. (2006). Overview of record linkage and current research directions. Statistical Research Division Report. Google Scholar