Who Is 1011011111\(\ldots \)1110110010? Automated Cryptanalysis of Bloom Filter Encryptions of Databases with Several Personal Identifiers

  • Martin KrollEmail author
  • Simone Steinmetzer
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 574)


We provide the first efficient cryptanalysis of Bloom filter encryptions of a database containing more than one personal identifier. The cryptanalysis is fully automated and shows several drawbacks of existing encryption methods based on Bloom filters. In particular, the special representation of the hash functions as linear combinations of two hash functions f and g is exploited in order to detect Bloom filter encryptions of single bigrams (so-called atoms). The assignment of atoms to bigrams is obtained via a modification of an algorithm which was originally proposed for the automated cryptanalysis of simple substitution ciphers. Using our approach, we were able to reconstruct 77.7 % of the identifier values correctly. We point to further improvements of the basic Bloom filter approach that are worth being investigated with respect to their privacy guarantees in future work.


Bloom filter Privacy-preserving record linkage Anonymity Hash function Cryptographic attack 



Research of both authors was supported by the research grant SCHN 586/19-1 of the German Research Foundation (DFG) awarded to the head of the Research Methodology Group, Rainer Schnell. We thank him and the three anonymous reviewers for their helpful comments.


  1. 1.
    Jones, M., McEwan, P., Morgan, C.L., Peters, J.L., Goodfellow, J., Currie, C.J.: Evaluation of the pattern of treatment, level of anticoagulation control, and outcome of treatment with warfarin in patients with non-valvar atrial fibrillation: a record linkage study in a large British population. Heart 91(4), 472–477 (2005)CrossRefGoogle Scholar
  2. 2.
    Newman, T.B., Brown, A.N.: Use of commercial record linkage software and vital statistics to identify patient deaths. J. Am. Med. Assoc. 4(3), 233–237 (1997)CrossRefGoogle Scholar
  3. 3.
    Van den Brandt, P.A., Schouten, L.J., Goldbohm, R.A., Dorant, E., Hunen, P.M.H.: Development of a record linkage protocol for use in the Dutch cancer registry for epidemiological research. Int. J. Epidemiol. 19(3), 553–558 (1990)CrossRefGoogle Scholar
  4. 4.
    Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Med. Inform. Decis. 9(41), 1–11 (2009)Google Scholar
  5. 5.
    Kuehni, C.E., Rueegg, C.S., Michel, G., Rebholz, C.E., Strippoli, M.-P.F., Niggli, F.K., Egger, M., von der Weid, N.X.: Cohort profile: the swiss childhood cancer survivor study. Int. J. Epidemiol. 41(6), 1553–1564 (2012)CrossRefGoogle Scholar
  6. 6.
    Rocha, M. C. N.: Vigilância dos óbitos registrados com causa básica hanseníase: caracterização no Brasil (2004–2009) e investigação em Fortaleza, Ceará (2006–2011). Master thesis, Universidade de Brasília (2013)Google Scholar
  7. 7.
    Randall, S.M., Ferrante, A.M., Boyd, J.H., Bauer, J.K., Semmens, J.B.: Privacy-preserving record linkage on large real world datasets. J. Biomed. Inform. 50, 205–212 (2014)CrossRefGoogle Scholar
  8. 8.
    Schnell, R., Richter, A., Borgs, C.: Performance of different methods for privacy preserving record linkage with large scale medical data sets. In: Presentation at the International Health Data Linkage Conference, Vancouver (2014)Google Scholar
  9. 9.
    Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer, New York (2007)zbMATHGoogle Scholar
  10. 10.
    Schnell, R., Bachteler, T., Reiher, J.: A novel error-tolerant anonymous linking code. Working Paper NO. WP-GRLC-2011-02, German Record Linkage Center, Nürnberg (2011)Google Scholar
  11. 11.
    Office for National Statistics: Beyond: Matching anonymous data (M9). Methods and Policies, Office for National Statistics, London (2011)Google Scholar
  12. 12.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)zbMATHCrossRefGoogle Scholar
  13. 13.
    Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of bloom filters in private record linkage. In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 226–245. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  14. 14.
    Niedermeyer, F., Steinmetzer, S., Kroll, M., Schnell, R.: Cryptanalysis of basic bloom filters used for privacy preserving record linkage. J. Priv. Confidentiality 6(2), 59–79 (2014)Google Scholar
  15. 15.
    Kuzu, M., Kantarcioglu, M., Durham, E., Toth, C., Malin, B.: A practical approach to achieve private medical record linkage in light of public resources. J. Am. Med. Assoc. 20(2), 285–292 (2012)CrossRefGoogle Scholar
  16. 16.
    Randall, S.M., Ferrante, A.M., Boyd, J.H., Semmens, J.B.: The effect of data cleaning on record linkage quality. BMC Med. Inform. Decis. 13(64), 1–10 (2013)Google Scholar
  17. 17.
    Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: building a better bloom filter. Random Struct. Algor. 33(2), 187–218 (2008)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Jakobsen, T.: A fast method for the cryptanalysis of substitution ciphers. Cryptol. 19(3), 265–274 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Borgelt, C.: Frequent item set mining. WIREs Data Min. Knowl. Discov. 2, 437–456 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Research Methodology GroupUniversity of Duisburg-EssenDuisburgGermany

Personalised recommendations