Skip to main content

Exact and Efficient Bayesian Inference for Privacy Risk Quantification

  • Conference paper
  • First Online:
Software Engineering and Formal Methods (SEFM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14323))

Included in the following conference series:

Abstract

Data analysis has high value both for commercial and research purposes. However, disclosing analysis results may pose severe privacy risk to individuals. Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code. The method uses probability distributions to model attacker knowledge and Bayesian inference to update said knowledge based on observable outputs. Currently, Privug uses Markov Chain Monte Carlo (MCMC) to perform inference, which is a flexible but approximate solution. This paper presents an exact Bayesian inference engine based on multivariate Gaussian distributions to accurately and efficiently quantify privacy risks. The inference engine is implemented for a subset of Python programs that can be modeled as multivariate Gaussian models. We evaluate the method by analyzing privacy risks in programs to release public statistics. The evaluation shows that our method accurately and efficiently analyzes privacy risks, and outperforms existing methods. Furthermore, we demonstrate the use of our engine to analyze the effect of differential privacy in public statistics.

Work partially supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF), the KASTEL Security Research Labs and the Danish Villum Foundation through Villum Experiment project No. 0002302.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Statistics Denmark. www.dst.dk/en Accessed 23 June 2023

  2. Statistics New Zealand. www.stats.govt.nz/ Accessed 23 June 2023

  3. US Census Bureau. www.census.gov/ Accessed 23 June 2023

  4. Alvim, M.S., Chatzikokolakis, K., McIver, A., Morgan, C., Palamidessi, C., Smith, G.: The Science of Quantitative Information Flow. Springer, Cham (2020)

    Book  MATH  Google Scholar 

  5. Article 29 Data Protection Working Party: Opinion 05/2014 on Anonymisation Techniques (2014). www.pdpjournals.com/docs/88197.pdf

  6. Avi Pfeffer: Practical probabilistic programming. Manning Publications Co. (2016)

    Google Scholar 

  7. Barthe, G., Katoen, J.P., Silva, A. (eds.): Foundations of Probabilistic Programming. Cambridge University Press (2020)

    Google Scholar 

  8. Biondi, F., Kawamoto, Y., Legay, A., Traonouez, L.: Hybrid statistical estimation of mutual information and its application to information flow. Formal Aspects Comput. 31(2), 165–206 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  9. Biondi, F., Legay, A., Traonouez, L.-M., Wąsowski, A.: QUAIL: a quantitative security analyzer for imperative code. In: Sharygina, N., Veith, H. (eds.) Computer Aided Verification, pp. 702–707. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_49

    Chapter  Google Scholar 

  10. Bishop, C.M.: Pattern Recognition and Machine Learning. Information science and statistics, Springer, New York (2006)

    MATH  Google Scholar 

  11. Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York (2002)

    MATH  Google Scholar 

  12. Cherubin, G., Chatzikokolakis, K., Palamidessi, C.: F-BLEAU: fast black-box leakage estimation. In: SP’19, pp. 835–852. IEEE (2019)

    Google Scholar 

  13. Chothia, T., Guha, A.: A statistical test for information leaks using continuous mutual information. In: CSF’11, pp. 177–190. IEEE (2011)

    Google Scholar 

  14. Chothia, T., Kawamoto, Y., Novakovic, C.: A tool for estimating information leakage. In: Sharygina, N., Veith, H. (eds.) Computer Aided Verification, pp. 690–695. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_47

    Chapter  Google Scholar 

  15. Chothia, T., Kawamoto, Y., Novakovic, C.: LeakWatch: Estimating Information Leakage from Java Programs. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8713, pp. 219–236. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11212-1_13

  16. Cover, T.M., Thomas, J.A.: Elements of information theory (2. ed.). Wiley (2006)

    Google Scholar 

  17. Dwork, C., Kohli, N., Mulligan, D.: Differential privacy in practice: Expose your epsilons! J. Privacy Confidentiality 9(2) (2019)

    Google Scholar 

  18. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  19. Eaton, M.: Multivariate Statistics: A Vector Space Approach. Lecture notes-monograph series, Institute of Mathematical Statistics (2007)

    Book  MATH  Google Scholar 

  20. Elliot, M., Mackey, E., O’Hara, K., Tudor, C.: The Anonymisation Decision - Making Framework. University of Manchester, UKAN (2016)

    Google Scholar 

  21. Garfinkel, S.L., Abowd, J.M., Martindale, C.: Understanding database reconstruction attacks on public data. Commun. ACM 62(3), 46–53 (2019)

    Article  Google Scholar 

  22. Gehr, T., Misailovic, S., Vechev, M.T.: PSI: exact symbolic inference for probabilistic programs. In: CAV’16. LNCS, vol. 9779, pp. 62–83 (2016)

    Google Scholar 

  23. Gehr, T., Steffen, S., Vechev, M.: \(\lambda \)PSI: exact inference for higher-order probabilistic programs. In: PLDI’20, pp. 883–897. ACM (2020)

    Google Scholar 

  24. Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: FOSE’14, pp. 167–181. ACM (2014)

    Google Scholar 

  25. Greenberg, S.C.E.: Understanding the Metropolis-Hastings Algorithm p. 10

    Google Scholar 

  26. Homan, M.D., Gelman, A.: The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian monte carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)

    Google Scholar 

  27. Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  28. Koller, D., Friedman, N.: Probabilistic Graphical Models - Principles and Techniques. MIT Press (2009)

    Google Scholar 

  29. Kucera, M., Tsankov, P., Gehr, T., Guarnieri, M., Vechev, M.T.: Synthesis of probabilistic privacy enforcement. In: CCS’17, pp. 391–408. ACM (2017)

    Google Scholar 

  30. McElreath, R.: Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press (2020)

    Google Scholar 

  31. Narayanan, P., Carette, J., Romano, W., Shan, C., Zinkov, R.: Probabilistic Inference by Program Transformation in Hakaru (System Description). In: Kiselyov, O., King, A. (eds.) FLOPS 2016. LNCS, vol. 9613, pp. 62–79. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29604-3_5

  32. Pardo, R., Rafnsson, W., Probst, C.W., Wąsowski, A.: Privug: Using Probabilistic Programming for Quantifying Leakage in Privacy Risk Analysis. In: Bertino, E., Shulman, H., Waidner, M. (eds.) ESORICS 2021. LNCS, vol. 12973, pp. 417–438. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88428-4_21

  33. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York, NY (2004)

    Book  MATH  Google Scholar 

  34. Romanelli, M., Chatzikokolakis, K., Palamidessi, C., Piantanida, P.: Estimating g-leakage via machine learning. In: CCS’20. ACM (2020)

    Google Scholar 

  35. Rønneberg, R.C., Pardo, R., Wąsowski, A.: Exact and Efficient Bayesian Inference for Privacy Risk Quantification (Accompanying Artifact). www.doi.org/10.5281/zenodo.8173905

  36. Rønneberg, R.C., Pardo, R., Wąsowski, A.: Exact and efficient Bayesian inference for privacy risk quantification (extended version). arXiv:2308.16700 (2023)

  37. Saad, F.A., Rinard, M.C., Mansinghka, V.K.: SPPL: Probabilistic programming with fast exact symbolic inference. In: PLDI’21, pp. 804–819. ACM (2021)

    Google Scholar 

  38. Stein, D., Staton, S.: Compositional semantics for probabilistic programs with exact conditioning. In: LICS’21, pp. 1–13. IEEE (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rasmus C. Rønneberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rønneberg, R.C., Pardo, R., Wąsowski, A. (2023). Exact and Efficient Bayesian Inference for Privacy Risk Quantification. In: Ferreira, C., Willemse, T.A.C. (eds) Software Engineering and Formal Methods. SEFM 2023. Lecture Notes in Computer Science, vol 14323. Springer, Cham. https://doi.org/10.1007/978-3-031-47115-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47115-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47114-8

  • Online ISBN: 978-3-031-47115-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics