Abstract
An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.
Similar content being viewed by others
References
[Armstrong and Mayda 1992] Armstrong, J. B. and J. E. Mayda (1992). Estimation of record linkage models using dependent data.Proceedings of the Survey Research Method Section, American Statistical Association, 853–858.
[Dempster, Laird, and Rubin 1977] Dempster A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm.Journal of the Royal Statistical Society, Series B 39, 1–38.
[Fellegi and Sunter 1969] Fellegi, I. P. and A. B. Sunter (1969). A theory for record linkage,Journal of the American Statistical Association 64, 1183–1210.
[Kirkendall 1985] Kirkendall, N. J. (1985). Weights in computer matching: Applications and an information theoretic point of view. In: B. Kilss and W. Alvey (Eds.),Record Linkage Techniques-1985. Proceedings of the Workshop on Exact Matching Methodologies, Washington, DC: Dept. of Treasury, IRS, Statistics of Income Division, pp. 189–197.
[Larsen and Rubin 2001] Larsen, M. D. and D. B. Rubin (2001). Iterative automated record linkage using mixture models.Journal of the American Statistical Association 96, 32–41.
[Thibaudeau 1993] Thibaudeau, Y. (1993). The discrimination power of dependency structures in record linkage.Survey Methodology 31–38.
[Winkler 1988] Winkler, W. E. (1988). Using the em-algorithm for weight computation in the fellegi-sunter model of record linkage.Proceedings of the Survey Research Method Section, American Statistical Association, 667–671.
[Winkler 1989] Winkler, W. E. (1989). Methods for adjusting for lack of independence in an application of the fellegi-sunter model of record linkage.Survey Methodology 15, 101–117.
[Winkler 1995] Winkler, W. E. (1995). Matching and record linkage. In B. Cox, D. Binder, B. Chinnappa, A. Christianson, M. Colledge, and P. Kott (Eds.),Business Survey Methods, New York: Wiley, pp. 355–384.
[Wu 1983] Wu, C. F. J. (1983). On the convergence properties of the emalgorithm.Annals of Statistics 11, 95–103.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schürle, J. A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage. Statistical Papers 46, 433–449 (2005). https://doi.org/10.1007/BF02762843
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02762843