Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage

  • Thilina Ranbaduge
  • Dinusha Vatsalan
  • Peter Christen
  • Vassilios Verykios
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9652)


In many application domains organizations require information from multiple sources to be integrated. Due to privacy and confidentiality concerns often these organizations are not willing or allowed to reveal their sensitive and personal data to other database owners, and to any external party. This has led to the emerging research discipline of privacy-preserving record linkage (PPRL). We propose a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match. Our approach allows each database owner to perform blocking independently except for the initial agreement of parameter settings and a final central hashing-based clustering. We provide an analysis of our technique in terms of complexity, quality, and privacy, and conduct an empirical study with large datasets. The results show that our approach is scalable with the size of the datasets and the number of parties, while providing better quality and privacy than previous multi-party private blocking approaches.


Locality sensitive hashing Clustering Bloom filters 



This research is funded by the Australian Research Council under Discovery Project DP130101801. We also like to thank Dimitrios Karapiperis for his valuable feedback.


  1. 1.
    Al-Lawati, A., Lee, D., McDaniel, P.: Blocking aware private record linkage. In: ACM IQIS (2005)Google Scholar
  2. 2.
    Broder, A.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (1997)Google Scholar
  3. 3.
    Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)Google Scholar
  4. 4.
    Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., et al.: Finding interesting associations without support pruning. IEEE TKDE 13, 64–78 (2001)Google Scholar
  5. 5.
    Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Faculty of the Graduate School of Vanderbilt University, Nashville, TN (2012)Google Scholar
  6. 6.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Theory of Computing (1998)Google Scholar
  7. 7.
    Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: ACM KDD (2015)Google Scholar
  8. 8.
    Karapiperis, D., Verykios, V.: An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage. IEEE TKDE 27, 909–921 (2015)Google Scholar
  9. 9.
    Kristensen, T.G., Nielsen, J., Pedersen, C.N.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms Mol. Biol. 5, 9 (2010)CrossRefGoogle Scholar
  10. 10.
    Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of Bloom filters in private record linkage. In: PETS (2011)Google Scholar
  11. 11.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)CrossRefMATHGoogle Scholar
  12. 12.
    Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy- preserving record linkage. In: AusDM, CRPIT (2014)Google Scholar
  13. 13.
    Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy- preserving record linkage. In: PAKDD (2015)Google Scholar
  14. 14.
    Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: ACM SIGMOD (2007)Google Scholar
  15. 15.
    Schnell, R., Bachteler, T., Reiher, J.: Privacy preserving record linkage using Bloom filters. BMC Med. Inform. Decis. Mak. 9, 1–11 (2009)CrossRefGoogle Scholar
  16. 16.
    Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Elsevier JIS (2013)Google Scholar
  17. 17.
    Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM (2014)Google Scholar
  18. 18.
    Vatsalan, D., Christen, P., O’Keefe, C.M., Verykios, V.S.: An evaluation framework for privacy-preserving record linkage. JPC 6, 13 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Thilina Ranbaduge
    • 1
  • Dinusha Vatsalan
    • 1
  • Peter Christen
    • 1
  • Vassilios Verykios
    • 2
  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia
  2. 2.School of Science and TechnologyHellenic Open UniversityPatrasGreece

Personalised recommendations