Bloom Filter Bootstrap: Privacy-Preserving Estimation of the Size of an Intersection

  • Hiroaki Kikuchi
  • Jun Sakuma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7964)


This paper proposes a new privacy-preserving scheme for estimating the size of the intersection of two given secret subsets. Given the inner product of two Bloom filters (BFs) of the given sets, the proposed scheme applies Bayesian estimation under assumption of beta distribution for an a priori probability of the size to be estimated. The BF retains the communication complexity and the Bayesian estimation improves the estimation accuracy.

An possible application of the proposed protocol is an epidemiological datasets regarding two attributes, Helicobactor pylori infection and stomach cancer. Assuming information related to Helicobactor Pylori infection and stomach cancer are separately collected, the protocol demonstrates that a χ 2-test can be performed without disclosing the contents of the two confidential databases.


Hash Function Beta Distribution Stomach Cancer Bayesian Estimation Bloom Filter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 86–97. ACM Press (2003)Google Scholar
  2. 2.
    Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. In: Internet Mathematics, pp. 636–646 (2002)Google Scholar
  3. 3.
    Camenisch, J., Zaverucha, G.M.: Private intersection of certified sets. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 108–127. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter 4(2), 28–34 (2002)CrossRefGoogle Scholar
  5. 5.
    De Cristofaro, E., Tsudik, G.: Practical private set intersection protocols with linear complexity. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 143–159. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)CrossRefGoogle Scholar
  7. 7.
    Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Helicobacter and Cancer Collaborative Group. Gastric cancer and helicobacter pylori: a combined analysis of 12 case control studies nested within prospective cohorts. Gut. 49(3), 347–353 (2001)Google Scholar
  10. 10.
    Atherton, J.C.: The pathogenesis of helicobacter pylori-induced gastro-duodenal diseases. Review of Pathology 1, 63–96 (2006)CrossRefGoogle Scholar
  11. 11.
    Kantarcioglu, M., Nix, R., Vaidya, J.: An efficient approximate protocol for privacy-preserving association rule mining. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 515–524. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Kuipers, E.J., Kusters, J.G., van Vliet, A.H.: Pathogenesis of helicobacter pylori infection. Clinical Microbiology Reviews 19(3), 449–490 (2006)CrossRefGoogle Scholar
  13. 13.
    Lu, H., He, X., Vaidya, J., Adam, N.R.: Secure construction of contingency tables from distributed data. In: Atluri, V. (ed.) DAS 2008. LNCS, vol. 5094, pp. 144–157. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Pagano, M., Gauvreau, K., Pagano, M.: Principles of biostatistics. Brooks/Cole (2000)Google Scholar
  15. 15.
    Ravikumar, P., Ravikumar, P., Fienberg, S.E., Cohen, W.W.: A secure protocol for computing string distance metrics. In: PSDM (2004)Google Scholar
  16. 16.
    Sakuma, J., Wright, R.N.: Privacy-preserving evaluation of generalization error and its application to model and attribute selection. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS, vol. 5828, pp. 338–353. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644 (2002)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2013

Authors and Affiliations

  • Hiroaki Kikuchi
    • 1
  • Jun Sakuma
    • 2
  1. 1.Department of Frontier Media Science, School of Interdisciplinary Mathematical SciencesMeiji UniversityNakano KuJapan
  2. 2.Graduate School of SIE, Computer Science DepartmentUniversity of TsukubaTsukubaJapan

Personalised recommendations