Abstract
Record linkage has a long tradition in both the statistical and the computer science literature. We survey current approaches to the record linkage problem in a privacy-aware setting and contrast these with the more traditional literature. We also identify several important open questions that pertain to private record linkage from different perspectives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 86–97. ACM, New York (2003)
Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)
Blake, I., Kolesnikov, V.: Strong conditional oblivious transfer and computing on intervals. In: Lee, P.J. (ed.) ASIACRYPT 2004. LNCS, vol. 3329, pp. 515–529. Springer, Heidelberg (2004)
Bourgain, J.: On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52(1), 46–52 (1985)
Churches, T., Christen, P.: Blind data linkage using n-gram similarity comparisons. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 121–126. Springer, Heidelberg (2004)
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making 4(1), 9 (2004)
Cohen, L.W., Cohen, W.W.: Data integration using similarity joins and a word- based information representation. ACM Transactions on Information Systems 18, 2000 (1998)
Domingo-Ferrer, J., Torra, V.: Validating distance-based record linkage with probabilistic record linkage. In: Escrig, M.T., Toledo, F.J., Golobardes, E. (eds.) CCIA 2002. LNCS (LNAI), vol. 2504, pp. 207–215. Springer, Heidelberg (2002)
Du, W., Chen, S., Han, Y.S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)
Dwork, C.: Differential privacy: A survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., Wrobel, T.A.: “Secure” log-linear and logistic regression analysis of distributed databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 277–290. Springer, Heidelberg (2006)
Fienberg, S., Slavkovic, A., Nardi, Y.: Valid statistical analysis for logistic regression with multiple sources. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds.) ISIPS 2008. LNCS, vol. 5661, pp. 82–94. Springer, Heidelberg (2009)
Fienberg, S.E., Manrique-Vallier, D.: Integrated methodology for multiple systems estimation and record linkage using a missing data formulation. AStA Adv. Stat. Anal. 93, 49–60 (2009)
Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On secure scalar product computation for privacy-preserving data mining. In: ISISC 2004 (2004)
Goldreich, O.: Modern Cryptography, Probabilistic Proofs, and Pseudorandomness. Springer, New York (1998)
Goldreich, O.: Foundations of Cryptography. Basic Applications, vol. 2. Cambridge University Press, Cambridge (2004)
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game or a completeness theorem for protocols with honest majority. In: STOC, pp. 218–229. ACM, New York (1987)
Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques, 1st edn. Springer, Heidelberg (May 2007)
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: ICDE, pp. 496–505. IEEE, Los Alamitos (2008)
Jaro, M.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84(406), 414–420 (1989)
Karr, A., Lin, X., Reiter, J., Sanil, A.: Secure regression on distributed databases. Journal of Computational and Graphical Statistics 14(2), 263–279 (2005)
Karr, A., Lin, X., Reiter, J., Sanil, A.: Secure analysis of distributed databases. In: Olwell, D., Wilson, A.G., Wilson, G. (eds.) Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, pp. 237–261. Springer, New York (2006)
Karr, A., Lin, X., Sanil, A., Reiter, J.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. Journal of Official Statistics 25(1), 125–138 (2009)
Lahiri, P., Larsen, M.: Regression analysis with linked data. Journal of the American Statistical Association 100(469), 222–230 (2002)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15(3), 177–206 (2002)
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. Journal of Privacy and Confidentiality 1(1), 59–98 (2009)
Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay–a secure two-party computation system. In: SSYM 2004: Proceedings of the 13th conference on USENIX Security Symposium, Berkeley, CA, USA, p. 20. USENIX Association (2004)
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)
Ravikumar, P., Cohen, W.W.: A hierarchical graphical model for record linkage. In: Chickering, D.M., Halpern, J.Y. (eds.) UAI, pp. 454–461. AUAI Press (2004)
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: PSDM held at ICDM, pp. 40–46 (2004)
Sadinle, M.: Multiple record linkage: Generalizing the fellegi-sunter theory. Conict Analysis Resource Center (CERAC), Bogota, Columbia, January 22 (2010)
Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.K.: Privacy preserving schema and data matching. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 653–664. ACM, New York (2007)
Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making 9(1), 41 (2009)
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 99–130. Springer, Heidelberg (2003)
Vaidya, J., Zhu, Y., Clifton, C.: Privacy Preserving Data Mining (Advances in Information Security). Springer, New York (2005)
Winkler, W.E.: Matching and record linkage. In: Business Survey Methods, pp. 355–384. Wiley, Chichester (1995)
Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census (1999)
Winkler, W.E.: Methods for record linkage and bayesian networks. Technical report, Series RRS2002/05, U.S. Bureau of the Census (2002)
Yakout, M., Atallah, M.J., Elmagarmid, A.K.: Efficient private record linkage. In: ICDE, pp. 1283–1286. IEEE, Los Alamitos (2009)
Yao, A.: Protocols for secure computations. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, pp. 160–164 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hall, R., Fienberg, S.E. (2010). Privacy-Preserving Record Linkage. In: Domingo-Ferrer, J., Magkos, E. (eds) Privacy in Statistical Databases. PSD 2010. Lecture Notes in Computer Science, vol 6344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15838-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-15838-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15837-7
Online ISBN: 978-3-642-15838-4
eBook Packages: Computer ScienceComputer Science (R0)