A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage
- Cite this article as:
- Schürle, J. Statistical Papers (2005) 46: 433. doi:10.1007/BF02762843
- 70 Downloads
An objective of Record Linkage is to link two data files by identifying common elements. A popular model for doing the separation is the probabilistic one from Fellegi and Sunter. To estimate the parameters needed for the model usually a mixture model is constructed and the EM algorithm is applied. For simplification, the assumption of conditional independence is often made. This assumption says that if several attributes of elements in the data are compared, then the results of the comparisons regarding the several attributes are independent within the mixture classes. A mixture model constructed with this assumption has been often used. Within this article a straightforward extension of the model is introduced which allows for conditional dependencies but is heavily dependent on the choice of the starting value. Therefore also an estimation procedure for the EM algorithm starting value is proposed. The two models are compared empirically in a simulation study based on telephone book entries. Particularly the effect of different starting values and conditional dependencies on the matching results is investigated.