Abstract
Data analysis has high value both for commercial and research purposes. However, disclosing analysis results may pose severe privacy risk to individuals. Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code. The method uses probability distributions to model attacker knowledge and Bayesian inference to update said knowledge based on observable outputs. Currently, Privug uses Markov Chain Monte Carlo (MCMC) to perform inference, which is a flexible but approximate solution. This paper presents an exact Bayesian inference engine based on multivariate Gaussian distributions to accurately and efficiently quantify privacy risks. The inference engine is implemented for a subset of Python programs that can be modeled as multivariate Gaussian models. We evaluate the method by analyzing privacy risks in programs to release public statistics. The evaluation shows that our method accurately and efficiently analyzes privacy risks, and outperforms existing methods. Furthermore, we demonstrate the use of our engine to analyze the effect of differential privacy in public statistics.
Keywords
- Privacy risk analysis
- Bayesian inference
- Probabilistic Programming
Work partially supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF), the KASTEL Security Research Labs and the Danish Villum Foundation through Villum Experiment project No. 0002302.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Statistics Denmark. www.dst.dk/en Accessed 23 June 2023
Statistics New Zealand. www.stats.govt.nz/ Accessed 23 June 2023
US Census Bureau. www.census.gov/ Accessed 23 June 2023
Alvim, M.S., Chatzikokolakis, K., McIver, A., Morgan, C., Palamidessi, C., Smith, G.: The Science of Quantitative Information Flow. Springer, Cham (2020)
Article 29 Data Protection Working Party: Opinion 05/2014 on Anonymisation Techniques (2014). www.pdpjournals.com/docs/88197.pdf
Avi Pfeffer: Practical probabilistic programming. Manning Publications Co. (2016)
Barthe, G., Katoen, J.P., Silva, A. (eds.): Foundations of Probabilistic Programming. Cambridge University Press (2020)
Biondi, F., Kawamoto, Y., Legay, A., Traonouez, L.: Hybrid statistical estimation of mutual information and its application to information flow. Formal Aspects Comput. 31(2), 165–206 (2019)
Biondi, F., Legay, A., Traonouez, L.-M., Wąsowski, A.: QUAIL: a quantitative security analyzer for imperative code. In: Sharygina, N., Veith, H. (eds.) Computer Aided Verification, pp. 702–707. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_49
Bishop, C.M.: Pattern Recognition and Machine Learning. Information science and statistics, Springer, New York (2006)
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York (2002)
Cherubin, G., Chatzikokolakis, K., Palamidessi, C.: F-BLEAU: fast black-box leakage estimation. In: SP’19, pp. 835–852. IEEE (2019)
Chothia, T., Guha, A.: A statistical test for information leaks using continuous mutual information. In: CSF’11, pp. 177–190. IEEE (2011)
Chothia, T., Kawamoto, Y., Novakovic, C.: A tool for estimating information leakage. In: Sharygina, N., Veith, H. (eds.) Computer Aided Verification, pp. 690–695. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_47
Chothia, T., Kawamoto, Y., Novakovic, C.: LeakWatch: Estimating Information Leakage from Java Programs. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8713, pp. 219–236. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11212-1_13
Cover, T.M., Thomas, J.A.: Elements of information theory (2. ed.). Wiley (2006)
Dwork, C., Kohli, N., Mulligan, D.: Differential privacy in practice: Expose your epsilons! J. Privacy Confidentiality 9(2) (2019)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Eaton, M.: Multivariate Statistics: A Vector Space Approach. Lecture notes-monograph series, Institute of Mathematical Statistics (2007)
Elliot, M., Mackey, E., O’Hara, K., Tudor, C.: The Anonymisation Decision - Making Framework. University of Manchester, UKAN (2016)
Garfinkel, S.L., Abowd, J.M., Martindale, C.: Understanding database reconstruction attacks on public data. Commun. ACM 62(3), 46–53 (2019)
Gehr, T., Misailovic, S., Vechev, M.T.: PSI: exact symbolic inference for probabilistic programs. In: CAV’16. LNCS, vol. 9779, pp. 62–83 (2016)
Gehr, T., Steffen, S., Vechev, M.: \(\lambda \)PSI: exact inference for higher-order probabilistic programs. In: PLDI’20, pp. 883–897. ACM (2020)
Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: FOSE’14, pp. 167–181. ACM (2014)
Greenberg, S.C.E.: Understanding the Metropolis-Hastings Algorithm p. 10
Homan, M.D., Gelman, A.: The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian monte carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003)
Koller, D., Friedman, N.: Probabilistic Graphical Models - Principles and Techniques. MIT Press (2009)
Kucera, M., Tsankov, P., Gehr, T., Guarnieri, M., Vechev, M.T.: Synthesis of probabilistic privacy enforcement. In: CCS’17, pp. 391–408. ACM (2017)
McElreath, R.: Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press (2020)
Narayanan, P., Carette, J., Romano, W., Shan, C., Zinkov, R.: Probabilistic Inference by Program Transformation in Hakaru (System Description). In: Kiselyov, O., King, A. (eds.) FLOPS 2016. LNCS, vol. 9613, pp. 62–79. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29604-3_5
Pardo, R., Rafnsson, W., Probst, C.W., Wąsowski, A.: Privug: Using Probabilistic Programming for Quantifying Leakage in Privacy Risk Analysis. In: Bertino, E., Shulman, H., Waidner, M. (eds.) ESORICS 2021. LNCS, vol. 12973, pp. 417–438. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88428-4_21
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York, NY (2004)
Romanelli, M., Chatzikokolakis, K., Palamidessi, C., Piantanida, P.: Estimating g-leakage via machine learning. In: CCS’20. ACM (2020)
Rønneberg, R.C., Pardo, R., Wąsowski, A.: Exact and Efficient Bayesian Inference for Privacy Risk Quantification (Accompanying Artifact). www.doi.org/10.5281/zenodo.8173905
Rønneberg, R.C., Pardo, R., Wąsowski, A.: Exact and efficient Bayesian inference for privacy risk quantification (extended version). arXiv:2308.16700 (2023)
Saad, F.A., Rinard, M.C., Mansinghka, V.K.: SPPL: Probabilistic programming with fast exact symbolic inference. In: PLDI’21, pp. 804–819. ACM (2021)
Stein, D., Staton, S.: Compositional semantics for probabilistic programs with exact conditioning. In: LICS’21, pp. 1–13. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rønneberg, R.C., Pardo, R., Wąsowski, A. (2023). Exact and Efficient Bayesian Inference for Privacy Risk Quantification. In: Ferreira, C., Willemse, T.A.C. (eds) Software Engineering and Formal Methods. SEFM 2023. Lecture Notes in Computer Science, vol 14323. Springer, Cham. https://doi.org/10.1007/978-3-031-47115-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-47115-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47114-8
Online ISBN: 978-3-031-47115-5
eBook Packages: Computer ScienceComputer Science (R0)