Who Is 1011011111\(\ldots \)1110110010? Automated Cryptanalysis of Bloom Filter Encryptions of Databases with Several Personal Identifiers
We provide the first efficient cryptanalysis of Bloom filter encryptions of a database containing more than one personal identifier. The cryptanalysis is fully automated and shows several drawbacks of existing encryption methods based on Bloom filters. In particular, the special representation of the hash functions as linear combinations of two hash functions f and g is exploited in order to detect Bloom filter encryptions of single bigrams (so-called atoms). The assignment of atoms to bigrams is obtained via a modification of an algorithm which was originally proposed for the automated cryptanalysis of simple substitution ciphers. Using our approach, we were able to reconstruct 77.7 % of the identifier values correctly. We point to further improvements of the basic Bloom filter approach that are worth being investigated with respect to their privacy guarantees in future work.
KeywordsBloom filter Privacy-preserving record linkage Anonymity Hash function Cryptographic attack
Research of both authors was supported by the research grant SCHN 586/19-1 of the German Research Foundation (DFG) awarded to the head of the Research Methodology Group, Rainer Schnell. We thank him and the three anonymous reviewers for their helpful comments.
- 1.Jones, M., McEwan, P., Morgan, C.L., Peters, J.L., Goodfellow, J., Currie, C.J.: Evaluation of the pattern of treatment, level of anticoagulation control, and outcome of treatment with warfarin in patients with non-valvar atrial fibrillation: a record linkage study in a large British population. Heart 91(4), 472–477 (2005)CrossRefGoogle Scholar
- 4.Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Med. Inform. Decis. 9(41), 1–11 (2009)Google Scholar
- 6.Rocha, M. C. N.: Vigilância dos óbitos registrados com causa básica hanseníase: caracterização no Brasil (2004–2009) e investigação em Fortaleza, Ceará (2006–2011). Master thesis, Universidade de Brasília (2013)Google Scholar
- 8.Schnell, R., Richter, A., Borgs, C.: Performance of different methods for privacy preserving record linkage with large scale medical data sets. In: Presentation at the International Health Data Linkage Conference, Vancouver (2014)Google Scholar
- 10.Schnell, R., Bachteler, T., Reiher, J.: A novel error-tolerant anonymous linking code. Working Paper NO. WP-GRLC-2011-02, German Record Linkage Center, Nürnberg (2011)Google Scholar
- 11.Office for National Statistics: Beyond: Matching anonymous data (M9). Methods and Policies, Office for National Statistics, London (2011)Google Scholar
- 14.Niedermeyer, F., Steinmetzer, S., Kroll, M., Schnell, R.: Cryptanalysis of basic bloom filters used for privacy preserving record linkage. J. Priv. Confidentiality 6(2), 59–79 (2014)Google Scholar
- 16.Randall, S.M., Ferrante, A.M., Boyd, J.H., Semmens, J.B.: The effect of data cleaning on record linkage quality. BMC Med. Inform. Decis. 13(64), 1–10 (2013)Google Scholar