A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage

  • Mehmet Kuzu
  • Murat Kantarcioglu
  • Elizabeth Durham
  • Bradley Malin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6794)


For over fifty years, “record linkage” procedures have been refined to integrate data in the face of typographical and semantic errors. These procedures are traditionally performed over personal identifiers (e.g., names), but in modern decentralized environments, privacy concerns have led to regulations that require the obfuscation of such attributes. Various techniques have been proposed to resolve the tension, including secure multi-party computation protocols, however, such protocols are computationally intensive and do not scale for real world linkage scenarios. More recently, procedures based on Bloom filter encoding (BFE) have gained traction in various applications, such as healthcare, where they yield highly accurate record linkage results in a reasonable amount of time. Though promising, no formal security analysis has been designed or applied to this emerging model, which is of concern considering the sensitivity of the corresponding data. In this paper, we introduce a novel attack, based on constraint satisfaction, to provide a rigorous analysis for BFE and guidelines regarding how to mitigate risk against the attack. In addition, we conduct an empirical analysis with data derived from public voter records to illustrate the feasibility of the attack. Our investigations show that the parameters of the BFE protocol can be configured to make it relatively resilient to the proposed attack without significant reduction in record linkage performance.


Hash Function Constraint Satisfaction Constraint Satisfaction Problem Record Linkage Bloom Filter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering 16, 1–16 (2007)CrossRefGoogle Scholar
  2. 2.
    Churches, T., Christen, P.: Blind data linkage using n-gram similarity comparisons. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 121–126. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 19–26 (2004)Google Scholar
  4. 4.
    Durham, E., Xue, Y., Kantarcioglu, M., Malin, B.: Private medical record linkage with approximate matching. In: Proceedings of the 2010 American Medical Informatics Association Annual Symposium, pp. 182–186 (2010)Google Scholar
  5. 5.
    Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th IEEE International Conference on Data Engineering, pp. 496–505 (2008)Google Scholar
  6. 6.
    Verykios, V., Karakasidis, A., Mitrogiannis, V.: Privacy preserving record linkage approaches. International Journal of Data Mining, Modelling and Management 1, 206–221 (2009)CrossRefGoogle Scholar
  7. 7.
    Christen, P., Pudjijono, A.: Accurate synthetic generation of realistic personal information. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 507–514 (2009)Google Scholar
  8. 8.
    Hernandez, M., Stolfo, S.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2, 9–37 (1998)CrossRefGoogle Scholar
  9. 9.
    Atallah, M., Kerschbaum, F., Du., W.: Secure and private sequence comparisons. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, pp. 39–44 (2003)Google Scholar
  10. 10.
    Feigenbaum, J., Ishai, Y., Nissim, K., Strauss, M., Wright, R.: Secure multiparty computation of approximations. ACM Transactions on Algorithms 2, 435–472 (2006)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Goldreich, O.: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)Google Scholar
  12. 12.
    Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making 9, 41 (2009)CrossRefGoogle Scholar
  13. 13.
    Lucks, M.: A constraint satisfaction algorithm for the automated decryption of simple substitution ciphers. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 132–144. Springer, Heidelberg (1991)Google Scholar
  14. 14.
    Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 422–426 (1970)zbMATHCrossRefGoogle Scholar
  15. 15.
    Quantin, C., Bouzelat, H., Allaert, F., Benhamiche, A., Faivre, J., Dusserre, L.: Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods of Information in Medicine 37, 271–277 (1998)Google Scholar
  16. 16.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)Google Scholar
  17. 17.
    Mitzenmacher, M., Upfal, E.: Probability and computing: An introduction to randomized algorithms and probabilistic analysis. Cambridge University Press, Cambridge (2005)zbMATHGoogle Scholar
  18. 18.
    Mooney, C.: Monte Carlo Simulation. Sage Publications, Thousand Oaks (1997)zbMATHGoogle Scholar
  19. 19.
    Newman, M.: Power laws, pareto distributions and zipf’s law. Contemporary Physics 46, 323–351 (2005)CrossRefGoogle Scholar
  20. 20.
    Bessire, C., Regin, J.: Mac and combined heuristics: Two reasons to forsake fc (and cbj?) on hard problems. In: Freuder, E.C. (ed.) CP 1996. LNCS, vol. 1118, pp. 61–75. Springer, Heidelberg (1996)Google Scholar
  21. 21.
    North Carolina Voter Registiration Database (2011),
  22. 22.
    Lakshmanan, L., Ng, R., Ramesh, G.: On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge. ACM Transactions on Knowledge Discovery from Data 2, 13 (2008)CrossRefGoogle Scholar
  23. 23.
    Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  24. 24.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random data perturbation techniques and privacy preserving data mining. Knowledge and Information Systems 7, 387–414 (2005)CrossRefGoogle Scholar
  25. 25.
    Chen, K., Liu, L.: Privacy preserving data classification with rotation perturbation. In: Proceedings of the 2005 IEEE Interanational Conference on Data Mining, pp. 589–592 (2005)Google Scholar
  26. 26.
    Pfitzmann, A.: Anonymity, unobservability, and pseudonymity - a proposal for terminology. In: Proceedings of the Privacy Enhancing Technologies Workshop, pp. 1–9 (2001)Google Scholar
  27. 27.
    Liu, K., Giannella, C.M., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  28. 28.
    Turgay, E.O., Pedersen, T.B., Saygın, Y., Savaş, E., Levi, A.: Disclosure risks of distance preserving data transformations. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 79–94. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  29. 29.
    Diaz, C., Seys, S., Claessens, J., Preneel, B.: Towards measuring anonymity. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 54–68. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  30. 30.
    Serjantov, A., Danezis, G.: Towards an information theoretic metric for anonymity. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 41–53. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  31. 31.
    Deng, Y., Pang, J., Wu, P.: Measuring anonymity with relative entropy. In: Dimitrakos, T., Martinelli, F., Ryan, P.Y.A., Schneider, S. (eds.) FAST 2006. LNCS, vol. 4691, pp. 65–79. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  32. 32.
    Koshy, T.: Discrete Mathematics with Applications. Elsevier, Amsterdam (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mehmet Kuzu
    • 1
  • Murat Kantarcioglu
    • 1
  • Elizabeth Durham
    • 2
  • Bradley Malin
    • 2
  1. 1.Dept. of Computer ScienceUniversity of Texas at DallasRichardsonUSA
  2. 2.Dept. of Biomedical InformaticsVanderbilt UniversityNashvilleUSA

Personalised recommendations