Skip to main content

Locality Sensitive Hashing with Extended Differential Privacy

  • 1194 Accesses

Part of the Lecture Notes in Computer Science book series (LNSC,volume 12973)

Abstract

Extended differential privacy, a generalization of standard differential privacy (DP) using a general metric, has been widely studied to provide rigorous privacy guarantees while keeping high utility. However, existing works on extended DP are limited to few metrics, such as the Euclidean metric. Consequently, they have only a small number of applications, such as location-based services and document processing.

In this paper, we propose a couple of mechanisms providing extended DP with a different metric: angular distance (or cosine distance). Our mechanisms are based on locality sensitive hashing (LSH), which can be applied to the angular distance and work well for personal data in a high-dimensional space. We theoretically analyze the privacy properties of our mechanisms, and prove extended DP for input data by taking into account that LSH preserves the original metric only approximately. We apply our mechanisms to friend matching based on high-dimensional personal data with angular distance in the local model, and evaluate our mechanisms using two real datasets. We show that LDP requires a very large privacy budget and that RAPPOR does not work in this application. Then we show that our mechanisms enable friend matching with high utility and rigorous privacy guarantees based on extended DP.

Keywords

  • Local differential privacy
  • Locality sensitive hashing
  • Angular distance
  • Extended differential privacy

The authors are ordered alphabetically. This work was supported by the French-Japanese project LOGIS within the Inria Equipes Associées program, by an Australian Government RTP Scholarship (2017278), by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST, and by JSPS KAKENHI Grant Number JP19H04113.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-88428-4_28
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-88428-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    In fact, since the channel on its own leaks nothing, there must be further information released in order to learn anything useful from this channel.

References

  1. Acharya, J., Sun, Z., Zhang, H.: Hadamard response: Estimating distributions privately, efficiently, and with little communication. In: AISTATS, pp. 1120–1129 (2019)

    Google Scholar 

  2. Aggarwal, C.C.: Recommender Systems. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-29659-3

    CrossRef  Google Scholar 

  3. Aghasaryan, A., Bouzid, M., Kostadinov, D., Kothari, M., Nandi, A.: On the use of LSH for privacy preserving personalization. In: TrustCom, pp. 362–371 (2013)

    Google Scholar 

  4. Alvim, M.S., Chatzikokolakis, K., Palamidessi, C., Pazii, A.: Invited paper: local differential privacy on metric spaces: optimizing the trade-off with utility. In: CSF, pp. 262–267 (2018). https://doi.org/10.1109/CSF.2018.00026

  5. Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I., Schmidt, L.: Practical and optimal LSH for angular distance. In: NIPS, pp. 1–9 (2015)

    Google Scholar 

  6. Andoni, A., Indyk, P., Razenshteyn, I.: Approximate nearest neighbor search in high dimensions. In: ICM, pp. 3287–3318. World Scientific (2018)

    Google Scholar 

  7. Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: CCS, pp. 901–914. ACM (2013). https://doi.org/10.1145/2508859.2516735

  8. Aumüller, M., Bourgeat, A., Schmurr, J.: Differentially private sketches for Jaccard similarity estimation. CoRR abs/2008.08134 (2020)

    Google Scholar 

  9. Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Optimal geo-indistinguishable mechanisms for location privacy. In: CCS, pp. 251–262 (2014)

    Google Scholar 

  10. Brendel, W., Han, F., Marujo, L., Jie, L., Korolova, A.: Practical privacy-preserving friend recommendations on social networks. In: WWW, pp. 111–112 (2018)

    Google Scholar 

  11. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60, 630–659 (2000)

    MathSciNet  CrossRef  Google Scholar 

  12. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)

    Google Scholar 

  13. Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_5

    CrossRef  Google Scholar 

  14. Chen, L., Zhu, P.: Preserving the privacy of social recommendation with a differentially private approach. In: SmartCity, pp. 780–785. IEEE (2015)

    Google Scholar 

  15. Chen, X., Liu, H., Yang, D.: Improved LSH for privacy-aware and robust recommender system with sparse data in edge environment. EURASIP J. Wirel. Commun. Netw. 2019(1), 1–11 (2019). https://doi.org/10.1186/s13638-019-1478-1

    CrossRef  Google Scholar 

  16. Cheng, H., Qian, M., Li, Q., Zhou, Y., Chen, T.: An efficient privacy-preserving friend recommendation scheme for social network. IEEE Access 6, 56018–56028 (2018)

    CrossRef  Google Scholar 

  17. Chow, R., Pathak, M.A., Wang, C.: A practical system for privacy-preserving collaborative filtering. In: ICDM Workshops, pp. 547–554 (2012)

    Google Scholar 

  18. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG, pp. 253–262 (2004)

    Google Scholar 

  19. Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: FOCS, pp. 429–438 (2013)

    Google Scholar 

  20. Dwork, C.: Differential privacy. In: ICALP, pp. 1–12 (2006)

    Google Scholar 

  21. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14

    CrossRef  Google Scholar 

  22. Dwork, C., Rothblum, G.N.: Concentrated differential privacy. CoRR abs/1603.01887 (2016)

    Google Scholar 

  23. Úlfar Erlingsson, Pihur, V., Korolova, A.: RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: CCS, pp. 1054–1067 (2014)

    Google Scholar 

  24. Fernandes, N., Dras, M., McIver, A.: Processing text for privacy: an information flow perspective. In: Havelund, K., Peleska, J., Roscoe, B., de Vink, E. (eds.) FM 2018. LNCS, vol. 10951, pp. 3–21. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95582-7_1

    CrossRef  Google Scholar 

  25. Fernandes, N., Dras, M., McIver, A.: Generalised differential privacy for text document processing. In: POST, pp. 123–148 (2019)

    Google Scholar 

  26. Fernandes, N., Kawamoto, Y., Murakami, T.: Locality sensitive hashing with extended differential privacy. CoRR abs/2010.09393 (2020). https://arxiv.org/abs/2010.09393

  27. Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: KDD, pp. 265–273. ACM (2008)

    Google Scholar 

  28. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)

    Google Scholar 

  29. Hu, H., Dobbie, G., Salcic, Z., Liu, M., Zhang, J., Lyu, L., Zhang, X.: Differentially private locality sensitive hashing based federated recommender system. Concurr. Comput. Pract. Exp. 1–16 (2020)

    Google Scholar 

  30. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)

    Google Scholar 

  31. Kairouz, P., Bonawitz, K., Ramage, D.: Discrete distribution estimation under local privacy. In: ICML, pp. 2436–2444 (2016)

    Google Scholar 

  32. Kamalaruban, P., Perrier, V., Asghar, H.J., Kaafar, M.A.: Not all attributes are created equal: \(d_x\)-private mechanisms for linear queries. In: Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2020, no. 1, pp. 103–125 (2020)

    Google Scholar 

  33. Kawamoto, Y., Murakami, T.: On the anonymization of differentially private location obfuscation. In: ISITA, pp. 159–163. IEEE (2018)

    Google Scholar 

  34. Kawamoto, Y., Murakami, T.: Local obfuscation mechanisms for hiding probability distributions. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) ESORICS 2019. LNCS, vol. 11735, pp. 128–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29959-0_7

    CrossRef  Google Scholar 

  35. Li, M., Ruan, N., Qian, Q., Zhu, H., Liang, X., Yu, L.: SPFM: scalable and privacy-preserving friend matching in mobile clouds. IEEE Internet Things J. 4(2), 583–591 (2017)

    CrossRef  Google Scholar 

  36. Liu, C., Mittal, P.: LinkMirage: enabling privacy-preserving analytics on social relationships. In: NDSS (2016)

    Google Scholar 

  37. Liu, Z., Wang, Y.X., Smola, A.J.: Fast differentially private matrix factorization. In: RecSys, pp. 171–178 (2015)

    Google Scholar 

  38. Ma, X., Ma, J., Li, H., Jiang, Q., Gao, S.: ARMOR: a trust-based privacy-preserving framework for decentralized friend recommendation in online social networks. Futur. Gener. Comput. Syst. 79, 82–94 (2018)

    CrossRef  Google Scholar 

  39. Machanavajjhala, A., Kifer, D., Abowd, J.M., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: ICDE, pp. 277–286. IEEE (2008)

    Google Scholar 

  40. Machanavajjhala, A., Korolova, A., Sarma, A.D.: Personalized social recommendations - accurate or private? VLDB 4(7), 440–450 (2020)

    Google Scholar 

  41. MovieLens 25m Dataset. https://grouplens.org/datasets/movielens/25m/. Accessed 2020

  42. Murakami, T., Hamada, K., Kawamoto, Y., Hatano, T.: Privacy-preserving multiple tensor factorization for synthesizing large-scale location traces with cluster-specific features. Proc. Priv. Enhancing Technol. 2021(2), 5–26 (2021)

    CrossRef  Google Scholar 

  43. Murakami, T., Kawamoto, Y.: Utility-optimized local differential privacy mechanisms for distribution estimation. In: USENIX Security, pp. 1877–1894 (2019)

    Google Scholar 

  44. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., Boneh, D., et al.: Location privacy via private proximity testing. In: NDSS, vol. 11 (2011)

    Google Scholar 

  45. Nissim, K., Stemmer, U.: Clustering algorithms for the centralized and local models. In: Algorithmic Learning Theory, pp. 619–653 (2019)

    Google Scholar 

  46. Qi, L., Zhang, X., Dou, W., Ni, Q.: A distributed locality-sensitive hashing-based approach for cloud service recommendation from multi-source data. IEEE J. Sel. Areas Commun. 35(11), 2616–2624 (2017)

    CrossRef  Google Scholar 

  47. Samanthula, B.K., Cen, L., Jiang, W., Si, L.: Privacy-preserving and efficient friend recommendation in online social networks. Trans. Data Priv. 8(2), 141–171 (2015)

    Google Scholar 

  48. Shin, H., Kim, S., Shin, J., Xiao, X.: Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Trans. Knowl. Data Eng. 30(9), 1770–1782 (2018)

    CrossRef  Google Scholar 

  49. Wang, J., Liu, W., Kumar, S., Chang, S.F.: Learning to hash for indexing big data - a survey. Proc. IEEE 104(1), 34–57 (2016)

    CrossRef  Google Scholar 

  50. Wang, S., et al.: Mutual information optimally local private discrete distribution estimation. CoRR abs/1607.08025 (2016). https://arxiv.org/abs/1607.08025

  51. Wang, T., Blocki, J., Li, N., Jha, S.: Locally differentially private protocols for frequency estimation. In: USENIX Security, pp. 729–745 (2017)

    Google Scholar 

  52. Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)

    CrossRef  Google Scholar 

  53. Xiang, Z., Ding, B., He, X., Zhou, J.: Linear and range counting under metric-based local differential privacy. In: ISIT, pp. 908–913 (2020)

    Google Scholar 

  54. Yang, D., Qu, B., Yang, J., Cudre-Mauroux, P.: Revisiting user mobility and social relationships in LBSNs: a hypergraph embedding approach. In: WWW, pp. 2147–2157 (2019)

    Google Scholar 

  55. Ye, M., Barga, A.: Optimal schemes for discrete distribution estimation under local differential privacy. In: ISIT, pp. 759–763 (2017)

    Google Scholar 

  56. Zhang, Y., Gao, N., Chen, J., Tu, C., Wang, J.: PrivRec: user-centric differentially private collaborative filtering using LSH and KD. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) ICONIP 2020. CCIS, vol. 1332, pp. 113–121. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63820-7_13

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendices

A Total Privacy Budgets in Extended DP and LDP

Table 1 shows total privacy budgets in extended DP and LDP calculated from Proposition 5 and the fact that the angular distance is 0.5 or smaller.

For example, when \(d_\theta =0.05\) and \(\kappa =10\), 20, and 50, the total privacy budget \(\xi =20\) in extended DP corresponds to the total privacy budget of 55, 79, and 120, respectively, in LDP.

Table 1. Total privacy budgets in extended DP (XDP) and LDP when \(d_\theta = 0.05\) or 0.1, \(\kappa = 10\), 20, or 50, and \(\delta = 0.01\).

B More Details on the Privacy Analyses

We show the relationships among CXDP, PXDP, and XDP as follows. See the preprint version [26] of this paper for the proofs.

Lemma 1

( \(\Rightarrow \) ). Let \(\mu \in \mathbb {R}_{\ge 0}\), \(\tau \in \mathbb {R}_{>0}\), \(\lambda \in \mathbb {D}\mathcal {R}\), \( A _\lambda : \mathcal {X}\rightarrow \mathbb {D}\mathcal {Y}\), and d be a metric over \(\mathcal {X}\). Let \(\delta \in (0, 1]\), \(\varepsilon = \tau \sqrt{-2\ln \delta }\), and \(\xi (x, x') = \mu d(x, x')+\varepsilon \). If \( A _\lambda \) provides \((\mu , \tau , d)\)-CXDP, then it provides \((\xi , \delta )\)-PXDP.

Lemma 2

( \(\Rightarrow \) ). Let \(\lambda \in \mathbb {D}\mathcal {R}\), \( A _\lambda : \mathcal {X}\rightarrow \mathbb {D}\mathcal {Y}\), \(\xi : \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}_{\ge 0}\), and \(\delta : \mathcal {X}\times \mathcal {X}\rightarrow [0,1]\). If \( A _\lambda \) provides \((\xi , \delta )\)-PXDP, it provides \((\xi , \delta )\)-XDP.

Next, we present the proofs for some of the main results as follows. See the preprint [26] of this paper for the proofs of the other technical results.

Proof

(of Proposition 4). For a \(\kappa \)-bit LSH function \(H \in \mathcal {H}^\kappa \),

$$\begin{aligned} Q_H(\boldsymbol{x})[y]&= Q _\mathsf{brr}(H(\boldsymbol{x}))[y] \\&\le e^{\varepsilon d _{\mathcal {V}}(H(\boldsymbol{x}), H(\boldsymbol{x}'))} Q _\mathsf{brr}(H(\boldsymbol{x}'))[y] \text {(by \textsf {XDP}{} of BRR)} \\&= e^{\varepsilon d _{\mathcal {V}}(H(\boldsymbol{x}), H(\boldsymbol{x}'))} Q_H(\boldsymbol{x}')[y] {.} \end{aligned}$$

Let Z be the random variable defined by \(Z {\mathop {=}\limits ^{\mathrm {def}}} d _{\mathcal {V}}(H(\boldsymbol{x}), H(\boldsymbol{x}'))\) where \(H = (h_1, h_2, \ldots , h_\kappa )\) is distributed over \(\mathcal {H}^\kappa \), namely, the seeds of these LSH functions are chosen randomly. Then \(0 \le Z \le \kappa \), and Z follows the binomial distribution with mean \(\mathop {\mathbb {E}}\limits [Z] = \kappa d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}')\). Then the random variable \(\varepsilon Z - \mathop {\mathbb {E}}\limits [\varepsilon Z]\) is centered, i.e., \(\mathop {\mathbb {E}}\limits [\varepsilon Z - \mathop {\mathbb {E}}\limits [\varepsilon Z]] = 0\), and ranges over \([-\varepsilon \kappa d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}'), \varepsilon \kappa (1 - d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}'))]\). Hence it follows from Hoeffding’s lemma that:

$$\mathop {\mathbb {E}}\limits [ \exp (t(\varepsilon Z - \mathop {\mathbb {E}}\limits [\varepsilon Z])) ] \le \exp \bigl ({\textstyle \frac{t^2}{8} \bigl (\varepsilon \kappa \bigr )^2}\bigr ) = \exp \bigl ({\textstyle \frac{t^2}{2} \bigl (\frac{\varepsilon \kappa }{2} \bigr )^2}\bigr ) {.} $$

Hence by definition, \(\varepsilon Z - \mathop {\mathbb {E}}\limits [\varepsilon Z]\) is \(\frac{\varepsilon \kappa }{2}\)-subgaussian. Therefore, the LSH-based mechanism \( Q _\mathsf{LSHRR}\) provides \((\varepsilon \kappa ,\frac{\varepsilon \kappa }{2}, d _{\mathcal {X}})\)-CXDP.    \(\square \)

Proof

(of Theorem 1). Let \(\alpha = \sqrt{\frac{-\ln \delta }{2\kappa }}\). Let Z be the random variable defined by \(Z {\mathop {=}\limits ^{\mathrm {def}}} d _{\mathcal {V}}(H(\boldsymbol{x}), H(\boldsymbol{x}'))\) where \(H = (h_1, h_2, \ldots , h_\kappa )\) is distributed over \(\mathcal {H}^\kappa \). Then Z follows the binomial distribution with mean \(\mathop {\mathbb {E}}\limits [Z] = \kappa d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}')\). Hence it follows from Chernoff-Hoeffding theorem that \(\mathop {\text {Pr}}\limits [ Z \ge \kappa ( d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}') + \alpha ) ] \le \exp \bigl ( - 2 \kappa \alpha ^2 \bigr ) = \delta \). Hence \(\mathop {\text {Pr}}\limits [ \varepsilon Z \ge \varepsilon \kappa d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}') + \varepsilon ' \sqrt{\kappa } ] \le \delta \). Therefore \( Q _\mathsf{LSHRR}\) provides \((\xi , \delta )\)-PXDP. By Lemma 2, \( Q _\mathsf{LSHRR}\) provides \((\xi , \delta )\)-XDP.    \(\square \)

Proof

(of Proposition 5). Let Z be the random variable defined by \(Z {\mathop {=}\limits ^{\mathrm {def}}} d _{\mathcal {V}}(H(\boldsymbol{x}), H(\boldsymbol{x}'))\) where \(H = (h_1, h_2, \ldots , h_\kappa )\) is distributed over \(\mathcal {H}^\kappa \). By Chernoff-Hoeffding theorem, \(\mathop {\text {Pr}}\limits [ Z \ge \kappa ( d _{\mathcal {X}}(\boldsymbol{x}, \boldsymbol{x}') + \alpha ) ] \le \delta _\alpha (\boldsymbol{x}, \boldsymbol{x}')\). Then \(\mathop {\text {Pr}}\limits [ \varepsilon Z \ge \xi _\alpha (\boldsymbol{x}, \boldsymbol{x}') ] \le \delta _\alpha (\boldsymbol{x}, \boldsymbol{x}')\). Therefore \( Q _\mathsf{LSHRR}\) provides \((\xi _\alpha , \delta _\alpha )\)-PXDP. By Lemma 2, \( Q _\mathsf{LSHRR}\) provides \((\xi _\alpha , \delta _\alpha )\)-XDP.    \(\square \)

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Fernandes, N., Kawamoto, Y., Murakami, T. (2021). Locality Sensitive Hashing with Extended Differential Privacy. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12973. Springer, Cham. https://doi.org/10.1007/978-3-030-88428-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88428-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88427-7

  • Online ISBN: 978-3-030-88428-4

  • eBook Packages: Computer ScienceComputer Science (R0)