Abstract
The probabilistic uncertainty in record linkage affects statistical analysis such as regression analysis of linked data. This paper considers Bayesian regression analysis with linked data and shows that despite using the usual normal regression analysis, the least squares type estimators of regression coefficients are not always adequate. A method is proposed in which the distribution of the response variable is used. This method is related to finite mixture analysis and leads to more accurate estimations. A simple approach has been proposed to increase the tractability and reduce the number of mixture components. A Monte Carlo simulation study is also performed to assess the proposed approach.
Similar content being viewed by others
References
Dempster AP, Larid NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B 39: 1–37
Diebolt J, Robert CP (1990a) Bayesian estimation of finite mixture distributions, Part I: Theoretical aspects. Technical report, LSTA, Universite Paris VI, 110
Diebolt J, Robert CP (1990b) Bayesian estimation of finite mixture distributions, Part II: Sampling implementation. Technical report, LSTA, Universite Paris VI, 111
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through bayesian sampling. J Roy Stat Soc Series B 56: 363–375
Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64: 694–707
Jaro MA (1985) Advances in Record-Linkage Methodology as applied to Matching the Census of Tempa, Florida. J Am Stat Assoc 84: 414–420
Lahiri P, Larsen MD (2005) Regression analysis with linked data. J Am Stat Assoc Theory Methods 496: 222–230
Larsen MD, Rubin DB (2001) Iterative automated record linkage using mixture models. J Am Stat Assoc Appl Case Stud 96: 32–41
Marin J-M, Mengersen K, Robert C (2005) Bayesian modelling and inference on mixtures of distributions. In: Rao C, Dey D (eds) Handbook of statistics, vol 25. Springer, New York
Newcomb HB, Kennedy JM, Axford SI, James AP (1959) Automatic record linkage of vital records. Science 130: 954–959
Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67: 306–310
Scheuren F, Winkler WE (1993) Regression analysis of data files that are computer matched. Surv Methodol 19: 39–58
Scheuren F, Winkler WE (1997) Regression analysis of data files that are computer matched II. Surv Methodol 23: 157–165
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fallah, A., Mohammadzadeh, M. Bayesian regression analysis with linked data using mixture normal distributions. Stat Papers 51, 421–430 (2010). https://doi.org/10.1007/s00362-009-0208-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-009-0208-x