Clustering-Based Scalable Indexing for Multi-party Privacy-Preserving Record Linkage

  • Thilina Ranbaduge
  • Dinusha Vatsalan
  • Peter Christen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9078)


The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.


Hierarchical canopy clustering Bloom filters Scalability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: ACM IQIS, pp. 59–68, Baltimore (2005)Google Scholar
  2. 2.
    Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)CrossRefMATHGoogle Scholar
  3. 3.
    Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)Google Scholar
  4. 4.
    Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE TKDE 24(9), 1537–1555 (2012)Google Scholar
  5. 5.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. SIGKDD Explorations 4(2), 28–34 (2002)CrossRefGoogle Scholar
  6. 6.
    Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: ACM SIGKDD, pp. 475–480, Edmonton (2002)Google Scholar
  7. 7.
    Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Faculty of the Graduate School of Vanderbilt University, Nashville, TN (2012)Google Scholar
  8. 8.
    Karakasidis, A., Verykios, V.: Secure blocking+secure matching = secure record linkage. Journal of Computing Science and Engineering 5, 223–235 (2011)CrossRefGoogle Scholar
  9. 9.
    Kristensen, T.G., Nielsen, J., Pedersen, C.N.: A tree-based method for the rapid screening of chemical fingerprints. Algo. for Molecular Biology 5(1), 9 (2010)CrossRefGoogle Scholar
  10. 10.
    Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: ACM EDBT, Genoa, Italy (2013)Google Scholar
  11. 11.
    Lai, P., Yiu, S., Chow, K., Chong, C., Hui, L.: An efficient Bloom filter based solution for multiparty private matching. In: SAM, Las Vegas (2006)Google Scholar
  12. 12.
    McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: ACM SIGKDD, Boston (2000)Google Scholar
  13. 13.
    Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158, Brisbane (2014)Google Scholar
  14. 14.
    Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making 9(1) (2009)Google Scholar
  15. 15.
    Schnell, R.: Privacy-preserving record linkage and privacy-preserving blocking for large files with cryptographic keys using multibit trees. In: JSM, Montreal (2013)Google Scholar
  16. 16.
    Vatsalan, D., Christen, P., Verykios, V.: A taxonomy of privacy-preserving record linkage techniques. JIS 38(6), 946–969 (2013)Google Scholar
  17. 17.
    Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp. 1795–1798, Shanghai (2014)Google Scholar
  18. 18.
    Vatsalan, D., Christen, P., O’Keefe, C.M., Verykios, V.: An evaluation framework for privacy-preserving record linkage. JPC 6(1), 35–75 (2014)Google Scholar
  19. 19.
    Vatsalan, D., Christen, P., Verykios, V.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM, San Francisco (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Thilina Ranbaduge
    • 1
  • Dinusha Vatsalan
    • 1
  • Peter Christen
    • 1
  1. 1.Research School of Computer Science, College of Engineering and Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations