Abstract
In many application domains organizations require information from multiple sources to be integrated. Due to privacy and confidentiality concerns often these organizations are not willing or allowed to reveal their sensitive and personal data to other database owners, and to any external party. This has led to the emerging research discipline of privacy-preserving record linkage (PPRL). We propose a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match. Our approach allows each database owner to perform blocking independently except for the initial agreement of parameter settings and a final central hashing-based clustering. We provide an analysis of our technique in terms of complexity, quality, and privacy, and conduct an empirical study with large datasets. The results show that our approach is scalable with the size of the datasets and the number of parties, while providing better quality and privacy than previous multi-party private blocking approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available from: ftp://alt.ncsbe.gov/data/.
References
Al-Lawati, A., Lee, D., McDaniel, P.: Blocking aware private record linkage. In: ACM IQIS (2005)
Broder, A.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (1997)
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., et al.: Finding interesting associations without support pruning. IEEE TKDE 13, 64–78 (2001)
Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Faculty of the Graduate School of Vanderbilt University, Nashville, TN (2012)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Theory of Computing (1998)
Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: ACM KDD (2015)
Karapiperis, D., Verykios, V.: An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage. IEEE TKDE 27, 909–921 (2015)
Kristensen, T.G., Nielsen, J., Pedersen, C.N.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms Mol. Biol. 5, 9 (2010)
Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of Bloom filters in private record linkage. In: PETS (2011)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy- preserving record linkage. In: AusDM, CRPIT (2014)
Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy- preserving record linkage. In: PAKDD (2015)
Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: ACM SIGMOD (2007)
Schnell, R., Bachteler, T., Reiher, J.: Privacy preserving record linkage using Bloom filters. BMC Med. Inform. Decis. Mak. 9, 1–11 (2009)
Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Elsevier JIS (2013)
Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM (2014)
Vatsalan, D., Christen, P., O’Keefe, C.M., Verykios, V.S.: An evaluation framework for privacy-preserving record linkage. JPC 6, 13 (2014)
Acknowledgements
This research is funded by the Australian Research Council under Discovery Project DP130101801. We also like to thank Dimitrios Karapiperis for his valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ranbaduge, T., Vatsalan, D., Christen, P., Verykios, V. (2016). Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-31750-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31749-6
Online ISBN: 978-3-319-31750-2
eBook Packages: Computer ScienceComputer Science (R0)