Statistical Papers

, Volume 46, Issue 3, pp 433–449

A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage

  • Josef Schürle
Article

DOI: 10.1007/BF02762843

Cite this article as:
Schürle, J. Statistical Papers (2005) 46: 433. doi:10.1007/BF02762843

Abstract

An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.

Keywords

Exact Matching Probabilistic Matching Mixture Model Incomplete Data Maximum-Likelihood Estimation EM Algorithm Simulation Study 

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Josef Schürle
    • 1
  1. 1.Department of Statistics, Econometrics and Operations ResearchUniversity of TübingenTübingenGermany