Skip to main content

Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Abstract

In many application domains organizations require information from multiple sources to be integrated. Due to privacy and confidentiality concerns often these organizations are not willing or allowed to reveal their sensitive and personal data to other database owners, and to any external party. This has led to the emerging research discipline of privacy-preserving record linkage (PPRL). We propose a novel blocking approach for multi-party PPRL to efficiently and effectively prune the record sets that are unlikely to match. Our approach allows each database owner to perform blocking independently except for the initial agreement of parameter settings and a final central hashing-based clustering. We provide an analysis of our technique in terms of complexity, quality, and privacy, and conduct an empirical study with large datasets. The results show that our approach is scalable with the size of the datasets and the number of parties, while providing better quality and privacy than previous multi-party private blocking approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available from: ftp://alt.ncsbe.gov/data/.

References

  1. Al-Lawati, A., Lee, D., McDaniel, P.: Blocking aware private record linkage. In: ACM IQIS (2005)

    Google Scholar 

  2. Broder, A.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (1997)

    Google Scholar 

  3. Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2012)

    Google Scholar 

  4. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., et al.: Finding interesting associations without support pruning. IEEE TKDE 13, 64–78 (2001)

    Google Scholar 

  5. Durham, E.: A framework for accurate, efficient private record linkage. Ph.D. thesis, Faculty of the Graduate School of Vanderbilt University, Nashville, TN (2012)

    Google Scholar 

  6. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Theory of Computing (1998)

    Google Scholar 

  7. Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: ACM KDD (2015)

    Google Scholar 

  8. Karapiperis, D., Verykios, V.: An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage. IEEE TKDE 27, 909–921 (2015)

    Google Scholar 

  9. Kristensen, T.G., Nielsen, J., Pedersen, C.N.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms Mol. Biol. 5, 9 (2010)

    Article  Google Scholar 

  10. Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of Bloom filters in private record linkage. In: PETS (2011)

    Google Scholar 

  11. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    Book  MATH  Google Scholar 

  12. Ranbaduge, T., Vatsalan, D., Christen, P.: Tree based scalable indexing for multi-party privacy- preserving record linkage. In: AusDM, CRPIT (2014)

    Google Scholar 

  13. Ranbaduge, T., Vatsalan, D., Christen, P.: Clustering-based scalable indexing for multi-party privacy- preserving record linkage. In: PAKDD (2015)

    Google Scholar 

  14. Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: ACM SIGMOD (2007)

    Google Scholar 

  15. Schnell, R., Bachteler, T., Reiher, J.: Privacy preserving record linkage using Bloom filters. BMC Med. Inform. Decis. Mak. 9, 1–11 (2009)

    Article  Google Scholar 

  16. Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Elsevier JIS (2013)

    Google Scholar 

  17. Vatsalan, D., Christen, P.: Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM (2014)

    Google Scholar 

  18. Vatsalan, D., Christen, P., O’Keefe, C.M., Verykios, V.S.: An evaluation framework for privacy-preserving record linkage. JPC 6, 13 (2014)

    Google Scholar 

Download references

Acknowledgements

This research is funded by the Australian Research Council under Discovery Project DP130101801. We also like to thank Dimitrios Karapiperis for his valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thilina Ranbaduge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ranbaduge, T., Vatsalan, D., Christen, P., Verykios, V. (2016). Hashing-Based Distributed Multi-party Blocking for Privacy-Preserving Record Linkage. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics