Skip to main content
Log in

A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage

  • Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • [Armstrong and Mayda 1992] Armstrong, J. B. and J. E. Mayda (1992). Estimation of record linkage models using dependent data.Proceedings of the Survey Research Method Section, American Statistical Association, 853–858.

  • [Dempster, Laird, and Rubin 1977] Dempster A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm.Journal of the Royal Statistical Society, Series B 39, 1–38.

    MATH  MathSciNet  Google Scholar 

  • [Fellegi and Sunter 1969] Fellegi, I. P. and A. B. Sunter (1969). A theory for record linkage,Journal of the American Statistical Association 64, 1183–1210.

    Article  Google Scholar 

  • [Kirkendall 1985] Kirkendall, N. J. (1985). Weights in computer matching: Applications and an information theoretic point of view. In: B. Kilss and W. Alvey (Eds.),Record Linkage Techniques-1985. Proceedings of the Workshop on Exact Matching Methodologies, Washington, DC: Dept. of Treasury, IRS, Statistics of Income Division, pp. 189–197.

    Google Scholar 

  • [Larsen and Rubin 2001] Larsen, M. D. and D. B. Rubin (2001). Iterative automated record linkage using mixture models.Journal of the American Statistical Association 96, 32–41.

    Article  MathSciNet  Google Scholar 

  • [Thibaudeau 1993] Thibaudeau, Y. (1993). The discrimination power of dependency structures in record linkage.Survey Methodology 31–38.

  • [Winkler 1988] Winkler, W. E. (1988). Using the em-algorithm for weight computation in the fellegi-sunter model of record linkage.Proceedings of the Survey Research Method Section, American Statistical Association, 667–671.

  • [Winkler 1989] Winkler, W. E. (1989). Methods for adjusting for lack of independence in an application of the fellegi-sunter model of record linkage.Survey Methodology 15, 101–117.

    Google Scholar 

  • [Winkler 1995] Winkler, W. E. (1995). Matching and record linkage. In B. Cox, D. Binder, B. Chinnappa, A. Christianson, M. Colledge, and P. Kott (Eds.),Business Survey Methods, New York: Wiley, pp. 355–384.

    Google Scholar 

  • [Wu 1983] Wu, C. F. J. (1983). On the convergence properties of the emalgorithm.Annals of Statistics 11, 95–103.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schürle, J. A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage. Statistical Papers 46, 433–449 (2005). https://doi.org/10.1007/BF02762843

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02762843

Keywords

Navigation