Skip to main content
Log in

Hybrid statistical estimation of mutual information and its application to information flow

  • Original Article
  • Published:
Formal Aspects of Computing

Abstract

Analysis of a probabilistic system often requires to learn the joint probability distribution of its random variables. The computation of the exact distribution is usually an exhaustive precise analysis on all executions of the system. To avoid the high computational cost of such an exhaustive search, statistical analysis has been studied to efficiently obtain approximate estimates by analyzing only a small but representative subset of the system’s behavior. In this paper we propose a hybrid statistical estimation method that combines precise and statistical analyses to estimate mutual information, Shannon entropy, and conditional entropy, together with their confidence intervals. We show how to combine the analyses on different components of a discrete system with different accuracy to obtain an estimate for the whole system. The new method performs weighted statistical analysis with different sample sizes over different components and dynamically finds their optimal sample sizes. Moreover, it can reduce sample sizes by using prior knowledge about systems and a new abstraction-then-sampling technique based on qualitative analysis. To apply the method to the source code of a system, we show how to decompose the code into components and to determine the analysis method for each component by overviewing the implementation of those techniques in the HyLeak tool. We demonstrate with case studies that the new method outperforms the state of the art in quantifying information leakage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alvim MS, Chatzikokolakis K, Kawamoto Y, Palamidessi C (2017) Information leakage games. In: 8th international conference on decision and game theory for security (GameSec 2017), volume 10575 of Lecture notes in computer science. Springer

  2. Alvim MS, Chatzikokolakis K, Kawamoto Y, Palamidessi C (2018) A game-theoretic approach to information-flow control via protocol composition. Entropy 20(5): 382–143

  3. Adami C (2004) Information theory in molecular biology. Phys Life Rev 1(1): 3–22

    Article  Google Scholar 

  4. Bouissou O, Goubault E, Putot S, Chakarov A, Sankaranarayanan S (2016) Uncertainty propagation using probabilistic affine forms and concentration of measure inequalities. In: Chechik M, Raskin J-F (eds) Tools and algorithms for the construction and analysis of systems—22nd international conference, TACAS 2016, Held as Part of the European joint conferences on theory and practice of software, ETAPS 2016, Eindhoven, The Netherlands, April 2–8, 2016, Proceedings, volume 9636 of Lecture notes in computer science. Springer, pp 225–243

  5. Barbot B, Haddad S, Picaronny C (2012) Coupling and importance sampling for statistical model checking. In: Flanagan C, König B (eds) Tools and algorithms for the construction and analysis of systems—18th international conference, TACAS 2012, Held as Part of the European joint conferences on theory and practice of software, ETAPS 2012, Tallinn, Estonia, March 24–April 1, 2012. Proceedings, volume 7214 of Lecture notes in computer science. Springer, pp 331–346

  6. Barthe G, Köpf B (2011) Information-theoretic bounds for differentially private mechanisms. In: Proceedings of the 24th IEEE computer security foundations symposium, CSF 2011, Cernay-la-Ville, France, 27–29 June, 2011. IEEE Computer Society, pp 191–204

  7. Biondi F, Kawamoto Y, Legay A, Traonouez L-M. HyLeak. https://project.inria.fr/hyleak/

  8. Biondi F, Kawamoto Y, Legay A, Traonouez L-M (2017) Hyleak: hybrid analysis tool for information leakage. In 15th international symposium on automated technology for verification and analysis (ATVA’17), volume 10482 of Lecture notes in computer science. Springer

  9. Backes M, Köpf B, Rybalchenko A (2009) Automatic discovery and quantification of information leaks. In: 30th IEEE symposium on security and privacy (S&P 2009), 17–20 May 2009, Oakland, California, USA. IEEE Computer Society, pp 141–153

  10. Biondi F, Legay A, Malacaria P, Wasowski A (2015) Quantifying information leakage of randomized protocols. Theor Comput Sci 597: 62–87

    Article  MathSciNet  MATH  Google Scholar 

  11. Biondi F, Legay A, Quilbeuf J (2015) Comparative analysis of leakage tools on scalable case studies. In: Bernd F, Jaco G (eds) Model checking software—22nd international symposium, SPIN 2015, Stellenbosch, South Africa, August 24–26, 2015, Proceedings, volume 9232 of Lecture notes in computer science. Springer, pp 263–281

  12. Biondi F, Legay A, Traonouez L-M, Wasowski A. QUAIL. https://project.inria.fr/quail/

  13. Biondi F, Legay A, Traonouez L-M, Wasowski A (2013) QUAIL: a quantitative security analyzer for imperative code. In: Sharygina N, Veith H (eds) Computer aided verification—25th international conference, CAV 2013, Saint Petersburg, Russia, July 13–19, 2013. Proceedings, volume 8044 of Lecture notes in computer science. Springer, pp 702–707

  14. Boreale M, Paolini M (2014) On formally bounding information leakage by statistical estimation. In: Chow SSM, Camenisch J, Chi Kwong HL, Yiu S-M (eds) Information security—17th international conference, ISC 2014, Hong Kong, China, October 12–14, 2014. Proceedings, volume 8783 of Lecture notes in computer science. Springer, pp 216–236

  15. Brillinger DR (2004) Some data analysis using mutual information. Braz J Probab Stat 18(6): 163–183

    MathSciNet  MATH  Google Scholar 

  16. Chatzikokolakis K, Chothia T, Guha A (2010) Statistical measurement of information leakage. In: Esparza J, Majumdar R (eds) Tools and algorithms for the construction and analysis of systems, 16th international conference, TACAS 2010, Held as part of the joint European conferences on theory and practice of software, ETAPS 2010, Paphos, Cyprus, March 20–28, 2010. Proceedings, volume 6015 of Lecture notes in computer science. Springer, pp 390–404

  17. Chakraborty S, Fremont DJ, Meel KS, Seshia SA, Vardi MY (2015) On parallel scalable uniform SAT witness generation. In: Baier C, Tinelli C (eds) Tools and algorithms for the construction and analysis of systems—21st international conference TACAS 2015, held as part of the European joint conferences on theory and practice of software, ETAPS 2015, London, UK, April 11–18, 2015. Proceedings, volume 9035 of Lecture notes in computer science. Springer, pp 304–319

  18. Chaum D (1998) The dining cryptographers problem: unconditional sender and recipient untraceability. J Cryptol 1: 65–75

    MathSciNet  MATH  Google Scholar 

  19. Clark D, Hunt S, Malacaria P (2001) Quantitative analysis of the leakage of confidential data. Electr Notes Theor Comput Sci 59(3): 238–251

    Article  Google Scholar 

  20. Clark D, Hunt S, Malacaria P (2007) A static analysis for quantifying information flow in a simple imperative language. J Comput Secur 15(3): 321–371

    Article  Google Scholar 

  21. Chothia T, Kawamoto Y (2014) Statistical estimation of min-entropy leakage, Manuscript

  22. Chothia T, Kawamoto Y, Novakovic C. leakiEst. http://www.cs.bham.ac.uk/research/projects/infotools/leakiest/

  23. Chothia T, Kawamoto Y, Novakovic C. LeakWatch. http://www.cs.bham.ac.uk/research/projects/infotools/leakwatch/

  24. Chothia T, Kawamoto Y, Novakovic C (2013) A tool for estimating information leakage. In: Sharygina N, Veith H (eds) Computer aided verification—25th international conference, CAV 2013, Saint Petersburg, Russia, July 13–19, 2013. Proceedings, volume 8044 of Lecture notes in computer science. Springer, pp 690–695

  25. Chothia T, Kawamoto Y, Novakovic C (2014) Leakwatch: estimating information leakage from java programs. In: Kutylowski M, Vaidya J (eds) Computer security—ESORICS 2014—19th European symposium on research in computer security, Wroclaw, Poland, September 7–11, 2014. Proceedings, part II, volume 8713 of Lecture notes in computer science. Springer, pp 219–236

  26. Chothia T, Kawamoto Y, Novakovic C, Parker D (2013) Probabilistic point-to-point information leakage. In: 2013 IEEE 26th computer security foundations symposium, New Orleans, LA, USA, June 26–28, 2013. IEEE Computer Society, pp 193–205

  27. Chadha R, Mathur U, Schwoon S (2014) Computing information flow using symbolic model-checking. In: Raman V, Suresh SP (eds) 34th international conference on foundation of software technology and theoretical computer science, FSTTCS 2014, December 15–17, 2014, New Delhi, India, volume 29 of LIPIcs. Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, pp 505–516

  28. Chakraborty S, Meel KS, Vardi MY (2013) A scalable approximate model counter. In: Schulte C (ed.), Principles and practice of constraint programming—19th international conference, CP 2013, Uppsala, Sweden, September 16–20, 2013. Proceedings, volume 8124 of Lecture notes in computer science. Springer, pp 200–216

  29. Chatzikokolakis K, Palamidessi C, Panangaden P (2008) Anonymity protocols as noisy channels. Inf Comput 206(2–4): 378–401

    Article  MathSciNet  MATH  Google Scholar 

  30. Clarkson MR, Schneider FB (2010) Hyperproperties. J Comput Secur 18(6): 1157–1210

    Article  Google Scholar 

  31. Cover TM, Thomas JA (2006) Elements of information theory (2nd edn). A Wiley-Interscience publication. Wiley

  32. Clarke EM, Zuliani P (2011) Statistical model checking for cyber-physical systems. In: Bultan T, Hsiung P-A (eds) Automated technology for verification and analysis, 9th international symposium, ATVA 2011, Taipei, Taiwan, October 11–14, 2011. Proceedings, volume 6996 of Lecture notes in computer science. Springer, pp 1–12

  33. Denning DE (1976) A lattice model of secure information flow. Commun ACM, 19(5): 236–243

    Article  MathSciNet  MATH  Google Scholar 

  34. Espinoza B, Smith G (2013) Min-entropy as a resource. Inf Comput 226: 57–75

    Article  MathSciNet  MATH  Google Scholar 

  35. Escolano F, Suau P, Bonev B (2009) Information theory in computer vision and pattern recognition, 1st edn. Springer Publishing Company, Incorporated

  36. Fremont DJ, Seshia SA (2014) Speeding up smt-based quantitative program analysis. In: Rümmer P, Wintersteiger CM (eds) Proceedings of the 12th international workshop on satisfiability modulo theories, SMT 2014, affiliated with the 26th international conference on computer aided verification (CAV 2014), the 7th international joint conference on automated reasoning (IJCAR 2014), and the 17th international conference on theory and applications of Satisfiability Testing (SAT 2014), Vienna, Austria, July 17–18, 2014., volume 1163 of CEUR workshop proceedings. CEUR-WS.org, 2014, pp 3–13

  37. Gallager RG (1968) Information theory and reliable communication. Wiley, New York

    MATH  Google Scholar 

  38. Gray JW (1991) III. Toward a mathematical foundation for information flow security. In: IEEE symposium on security and privacy, pp 21–35

  39. Jensen FV (1996) Introduction to Bayesian networks, 1st edn. Springer-Verlag New York, Inc., Secaucus

  40. Köpf B, Basin DA (2007) An information-theoretic model for adaptive side-channel attacks. In: Ning P, De Capitani di Vimercati S, Syverson PF (eds) Proceedings of the 2007 ACM conference on computer and communications security, CCS 2007, Alexandria, Virginia, USA, October 28–31, 2007. ACM, pp 286–296

  41. Kawamoto Y, Biondi F, Legay A(2016) Hybrid statistical estimation of mutual information for quantifying information flow. In: Fitzgerald JS, Heitmeyer CL, Gnesi S, Philippou A(eds) FM 2016: Formal methods—21st international symposium, Limassol, Cyprus, November 9–11, 2016, Proceedings, volume 9995 of Lecture notes in computer science, pp 406–425

  42. Kawamoto Y, Chatzikokolakis K, Palamidessi C (2014) Compositionality results for quantitative information flow. In: Norman G, Sanders WH (eds) Quantitative evaluation of systems—11th international conference, QEST 2014, Florence, Italy, September 8–10, 2014. Proceedings, volume 8657 of Lecture notes in computer science. Springer, pp 368–383

  43. Kawamoto Y, Chatzikokolakis K, Palamidessi C (2017) On the compositionality of quantitative information flow. Log Methods Comput Sci 13(3:11): 1–31

    MathSciNet  MATH  Google Scholar 

  44. Kawamoto Y, Given-Wilson T (2015) Quantitative information flow for scheduler-dependent systems. In: Bertrand N, Tribastone M (eds) Proceedings thirteenth workshop on quantitative aspects of programming languages and systems, QAPL 2015, London, UK, 11th–12th April 2015. volume 194 of EPTCS, pp 48–62

  45. Kang MG, McCamant S, Poosankam P, Song D (2011) DTA++: dynamic taint analysis with targeted control-flow propagation. In: Proceedings of the network and distributed system security symposium, NDSS 2011, San Diego, California, USA, 6th February–9th February 2011. The Internet Society

  46. Köpf B, Rybalchenko A(2010) Approximation and randomization for quantitative information-flow analysis. In: Proceedings of the 23rd IEEE computer security foundations symposium, CSF 2010, Edinburgh, United Kingdom, July 17–19, 2010. IEEE Computer Society, pp 3–14

  47. Liu Z, Chen Z, Fang C, Shi Q (2014) Hybrid test data generation. In: Jalote P, Briand LC, van der Hoek A (eds) 36th international conference on software engineering, ICSE ’14, Companion proceedings, Hyderabad, India, May 31–June 07, 2014. ACM, pp 630–631

  48. Legay A, Delahaye B, Bensalem S (2010) Statistical model checking: an overview. In: Barringer H, Falcone Y, Finkbeiner B, Havelund K, Lee I, Pace GJ, Rosu G, Sokolsky O, Tillmann N (eds) Runtime verification—first international conference, RV 2010, St. Julians, Malta, November 1–4, 2010. Proceedings, volume 6418 of Lecture notes in computer science. Springer, pp 122–135

  49. MacKay DJC (2002) Information theory, inference & learning algorithms. Cambridge University Press, New York

    Google Scholar 

  50. Malacaria P (2007) Assessing security threats of looping constructs. In: Hofmann M, Felleisen M (eds) Proceedings of the 34th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2007, Nice, France, January 17–19, 2007. ACM, pp 225–235

  51. McCamant S, Ernst MD (2008) Quantitative information flow as network flow capacity. In: Gupta R, Amarasinghe SP (eds) Proceedings of the ACM SIGPLAN 2008 conference on programming language design and implementation, Tucson, AZ, USA, June 7–13, 2008. ACM, pp 193–205

  52. Malacaria P, Khouzani MHR, Pasareanu CS, Phan Q-S, Luckow K S (2018) Symbolic side-channel analysis for probabilistic programs. In Proceedings of the 31st IEEE computer security foundations symposium, CSF 2018. To appear

  53. Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16: 233–248

    Article  MathSciNet  Google Scholar 

  54. Majumdar R, Sen K (2007) Hybrid concolic testing. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, May 20–26, 2007. IEEE Computer Society, pp 416–426

  55. Newsome J, McCamant S, Song D (2009) Measuring channel capacity to distinguish undue influence. In: Chong S, Naumann DA (eds) Proceedings of the 2009 workshop on programming languages and analysis for security, PLAS 2009, Dublin, Ireland, 15–21 June, 2009. ACM, pp 73–85

  56. Parr T (2007) The definitive ANTLR reference: building domain specific languages

  57. Phan Q-S, Malacaria P (2014) Abstract model counting: a novel approach for quantification of information leaks. In: Moriai S, Jaeger T, Sakurai K (eds) 9th ACM symposium on information, computer and communications security, ASIA CCS ’14, Kyoto, Japan—June 03–06, 2014. ACM, pp 283–292

  58. Phan Q-S, Malacaria P, Pasareanu CS, d’Amorim M (2014) Quantifying information leaks using reliability analysis. In: Rungta N, Tkachuk O (eds) 2014 International symposium on model checking of software, SPIN 2014, Proceedings, San Jose, CA, USA, July 21–23, 2014. ACM, pp 105–108

  59. Phan Q-S, Malacaria P, Tkachuk O, Pasareanu CS (2012) Symbolic quantitative information flow. ACM SIGSOFT Softw Eng Notes 37(6): 1–5

    Article  Google Scholar 

  60. Smith G (2009) On the foundations of quantitative information flow. In: de Alfaro L (ed) Foundations of software science and computational structures, 12th international conference FOSSACS 2009, held as part of the joint European conferences on theory and practice of software, ETAPS 2009, York, UK, March 22–29, 2009. Proceedings, volume 5504 of Lecture notes in computer science. Springer, pp 288–302

  61. Sweet I, Trilla JMC, Scherrer C, Hicks M, Magill S (2018) What’s the over/under? Probabilistic bounds on information leakage. In Lujo B, Ralf K (eds) Principles of security and trust - 7th international conference, POST 2018, Held as part of the European joint conferences on theory and practice of software, ETAPS 2018, Thessaloniki, Greece, April 14–20, 2018, Proceedings, volume 10804 of Lecture notes in computer science. Springer, pp 3–27

  62. Val CG, Enescu MA, Bayless S, Aiello W, Hu AJ (2016) Precisely measuring quantitative information flow: 10k lines of code and beyond. In: IEEE European symposium on security and privacy, EuroS&P 2016, Saarbrücken, Germany, March 21–24, 2016. IEEE, pp 31–46

  63. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1): 37–57

    Article  MathSciNet  MATH  Google Scholar 

  64. Weigl A (2016) Efficient sat-based pre-image enumeration for quantitative information flow in programs. In: Livraga G, Torra V, Aldini A, Martinelli F, Suri N (eds) Data privacy management and security assurance—11th international workshop, DPM 2016 and 5th international workshop, QASA 2016, Heraklion, Crete, Greece, September 26–27, 2016, Proceedings, volume 9963 of Lecture notes in computer science. Springer, pp 51–58

  65. Wilde MM (2013) Quantum information theory, 1st edn. Cambridge University Press, New York

  66. Yasuoka H, Terauchi T (2014) Quantitative information flow as safety and liveness hyperproperties. Theor Comput Sci 538: 167–182

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are thankful to anonymous reviewers for helpful comments. This work was supported by JSPS KAKENHI Grant Number JP17K12667, by JSPS and Inria under the project LOGIS of the Japan-France AYAME Program, by the MSR-Inria Joint Research Center, by the Sensation European grant, and by région Bretagne.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yusuke Kawamoto.

Additional information

Communicated by Connie Heitmeyer, Ana Cavalcanti, John Fitzgerald, and Stefania Gnesi

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biondi, F., Kawamoto, Y., Legay, A. et al. Hybrid statistical estimation of mutual information and its application to information flow. Form Asp Comp 31, 165–206 (2019). https://doi.org/10.1007/s00165-018-0469-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00165-018-0469-z

Keywords

Navigation