Private Similarity Computation in Distributed Systems: From Cryptography to Differential Privacy

  • Mohammad Alaggan
  • Sébastien Gambs
  • Anne-Marie Kermarrec
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7109)


In this paper, we address the problem of computing the similarity between two users (according to their profiles) while preserving their privacy in a fully decentralized system and for the passive adversary model. First, we introduce a two-party protocol for privately computing a threshold version of the similarity and apply it to well-known similarity measures such as the scalar product and the cosine similarity. The output of this protocol is only one bit of information telling whether or not two users are similar beyond a predetermined threshold. Afterwards, we explore the computation of the exact and threshold similarity within the context of differential privacy. Differential privacy is a recent notion developed within the field of private data analysis guaranteeing that an adversary that observes the output of the differentially private mechanism, will only gain a negligible advantage (up to a privacy parameter) from the presence (or absence) of a particular item in the profile of a user. This provides a strong privacy guarantee that holds independently of the auxiliary knowledge that the adversary might have. More specifically, we design several differentially private variants of the exact and threshold protocols that rely on the addition of random noise tailored to the sensitivity of the considered similarity measure. We also analyze their complexity as well as their impact on the utility of the resulting similarity measure. Finally, we provide experimental results validating the effectiveness of the proposed approach on real datasets.


Privacy similarity measure homomorphic encryption differential privacy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amer-Yahia, S., Benedikt, M., Lakshmanan, L.V.S., Stoyanovich, J.: Efficient Network Aware Search in Collaborative Tagging Sites. PVLDB 2008 1(1) (August 2008)Google Scholar
  2. 2.
    Bai, X., Bertier, M., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: Gossiping Personalized Queries. In: EDBT 2010, Lausanne, Switzerland, March 22-26 (2010)Google Scholar
  3. 3.
    Barbaro, M., Zeller, T.: A face is exposed for AOL searcher No. 4417749. New York Times (2006)Google Scholar
  4. 4.
    Bertier, M., Frey, D., Guerraoui, R., Kermarrec, A.-M., Leroy, V.: The Gossple Anonymous Social Network. In: Gupta, I., Mascolo, C. (eds.) Middleware 2010. LNCS, vol. 6452, pp. 191–211. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Chaum, D.: Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms. CACM 24(2) (February 1982)Google Scholar
  6. 6.
    Cramer, R., Damgård, I., Nielsen, J.B.: Multiparty Computation from Threshold Homomorphic Encryption. In: Pfitzmann, B. (ed.) EUROCRYPT 2001. LNCS, vol. 2045, pp. 280–300. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Damgård, I., Jurik, M.: A Generalisation, a Simplification and Some Applications of Paillier’s Probabilistic Public-Key System. In: Kim, K. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Dingledine, R., Mathewson, N., Syverson, P.F.: Tor: The Second-Generation Onion Router. In: Proceedings of the 13th USENIX Security Symposium, San Diego, California, USA, August 9-13 (2004)Google Scholar
  9. 9.
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our Data, Ourselves: Privacy Via Distributed Noise Generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Garay, J.A., Schoenmakers, B., Villegas, J.: Practical and Secure Solutions for Integer Comparison. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, pp. 330–342. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On Private Scalar Product Computation for Privacy-Preserving Data Mining. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Goldreich, O.: Foundations of Cryptography. Cambridge University Press (2001)Google Scholar
  15. 15.
    Harkness, W.L.: Properties of the Extended Hypergeometric Distribution. The Annals of Mathematical Statistics 36(3) (June 1965)Google Scholar
  16. 16.
    Jaccard, P.: Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37(142) (1901)Google Scholar
  17. 17.
    Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A.-M., van Steen, M.: Gossip-based Peer Sampling. In: TOCS 2007, vol. 25(3) (August 2007)Google Scholar
  18. 18.
    Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What Can We Learn Privately?. In: FOCS 2008, Philadelphia, Pennsylvania, USA, October 25-28 (2008)Google Scholar
  19. 19.
    Kissner, L., Song, D.X.: Privacy-Preserving Set Operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    McSherry, F., Mironov, I.: Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contenders. In: SIGKDD 2009, June 28-July 1. ACM, Paris (2009)Google Scholar
  21. 21.
    Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: Proceedings of the 29th IEEE Symposium on Security and Privacy, Oakland, California, USA, May 18-21 (2008)Google Scholar
  22. 22.
    Narayanan, A., Shmatikov, V.: De-anonymizing Social Networks. In: Proceedings of the 30th IEEE Symposium on Security and Privacy, Oakland, California, USA, May 17-20 (2009)Google Scholar
  23. 23.
    Nishide, T., Ohta, K.: Multiparty Computation for Interval, Equality, and Comparison Without Bit-Decomposition Protocol. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, pp. 343–360. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Nishide, T., Sakurai, K.: Distributed Paillier Cryptosystem without Trusted Dealer. In: WISA 2010, Jeju Island, Korea, August 24-26 (2010)CrossRefGoogle Scholar
  25. 25.
    Paillier, P.: Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  26. 26.
    Reiter, M.K., Rubin, A.D.: Crowds: Anonymity for Web Transactions. TISSEC 1(1) (November 1998)Google Scholar
  27. 27.
    Shaneck, M., Kim, Y., Kumar, V.: Privacy Preserving Nearest Neighbor Search. In: ICDM 2006, December 18-22. IEEE, Hong Kong (2006)Google Scholar
  28. 28.
    Wright, R.N., Yang, Z.: Privacy-Preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data. In: SIGKDD 2004, August 22-25. ACM, Seattle (2004)Google Scholar
  29. 29.
    Yao, D., Tamassia, R., Proctor, S.: Private Distributed Scalar Product Protocol With Application To Privacy-Preserving Computation of Trust. In: IFIPTM 2007, Moncton, New Brunswick, Canada, July 30-August 2 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mohammad Alaggan
    • 1
  • Sébastien Gambs
    • 2
  • Anne-Marie Kermarrec
    • 3
  1. 1.Université Rennes 1 – IRISARennesFrance
  2. 2.Université de Rennes 1 – INRIA/IRISARennesFrance
  3. 3.INRIA Rennes Bretagne-AtlantiqueRennesFrance

Personalised recommendations