Private Client-Side Profiling with Random Forests and Hidden Markov Models

  • George Danezis
  • Markulf Kohlweiss
  • Benjamin Livshits
  • Alfredo Rial
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7384)

Abstract

Nowadays, service providers gather fine-grained data about users to deliver personalized services, for example, through the use of third-party cookies or social network profiles. This poses a threat both to privacy, since the amount of information obtained is excessive for the purpose of customization, and authenticity, because those methods employed to gather data can be blocked and fooled.

In this paper we propose privacy-preserving profiling techniques, in which users perform the profiling task locally, reveal to service providers the result and prove its correctness. We address how our approach applies to tasks of both classification and pattern recognition. For the former, we describe client-side profiling based on random forests, where users, based on certified input data representing their activity, resolve a random forest and reveal the classification result to service providers. For the latter, we show how to match a stream of user activity to a regular expression, or how to assign it a probability using a hidden Markov model. Our techniques, based on the use of zero-knowledge proofs, can be composed with other protocols as part of the certification of a larger computation.

Keywords

Hide Markov Model Random Forest Signature Scheme Random Oracle Regular Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Almeida, J.B., Bangerter, E., Barbosa, M., Krenn, S., Sadeghi, A.-R., Schneider, T.: A Certifying Compiler for Zero-Knowledge Proofs of Knowledge Based on Σ-Protocols. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345, pp. 151–167. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Alsaid, A., Martin, D.: Detecting Web Bugs with Bugnosis: Privacy Advocacy through Education. In: Dingledine, R., Syverson, P.F. (eds.) PET 2002. LNCS, vol. 2482, pp. 13–26. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Bangerter, E., Briner, T., Henecka, W., Krenn, S., Sadeghi, A.-R., Schneider, T.: Automatic Generation of Sigma-Protocols. In: Martinelli, F., Preneel, B. (eds.) EuroPKI 2009. LNCS, vol. 6391, pp. 67–82. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Baum, L.E., Petrie, T.: Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics 37(6), 1554–1563 (1966)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Belenkiy, M., Chase, M., Kohlweiss, M., Lysyanskaya, A.: P-signatures and Noninteractive Anonymous Credentials. In: Canetti, R. (ed.) TCC 2008. LNCS, vol. 4948, pp. 356–374. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Bellare, M., Goldreich, O.: On Defining Proofs of Knowledge. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 390–420. Springer, Heidelberg (1993)Google Scholar
  7. 7.
    Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for designing efficient protocols. In: First ACM Conference on Computer and Communication Security, pp. 62–73. Association for Computing Machinery (1993)Google Scholar
  8. 8.
    Bosch, A., Zisserman, A., Muoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  9. 9.
    Boudot, F.: Efficient Proofs that a Committed Number Lies in an Interval. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 431–444. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Brands, S.: Rapid Demonstration of Linear Relations Connected by Boolean Operators. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 318–333. Springer, Heidelberg (1997)Google Scholar
  11. 11.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  12. 12.
    Camenisch, J.: Group Signature Schemes and Payment Systems Based on the Discrete Logarithm Problem. PhD thesis, ETH Zürich (1998)Google Scholar
  13. 13.
    Camenisch, J.L., Chaabouni, R., Shelat, A.: Efficient Protocols for Set Membership and Range Proofs. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 234–252. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Camenisch, J., Kiayias, A., Yung, M.: On the Portability of Generalized Schnorr Proofs. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 425–442. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Camenisch, J., Krenn, S., Shoup, V.: A Framework for Practical Universally Composable Zero-Knowledge Protocols. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 449–467. Springer, Heidelberg (2011)Google Scholar
  16. 16.
    Camenisch, J.L., Lysyanskaya, A.: An Efficient System for Non-transferable Anonymous Credentials with Optional Anonymity Revocation. In: Pfitzmann, B. (ed.) EUROCRYPT 2001. LNCS, vol. 2045, pp. 93–118. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    Camenisch, J.L., Lysyanskaya, A.: A Signature Scheme with Efficient Protocols. In: Cimato, S., Galdi, C., Persiano, G. (eds.) SCN 2002. LNCS, vol. 2576, pp. 268–289. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Camenisch, J.L., Lysyanskaya, A.: Signature Schemes and Anonymous Credentials from Bilinear Maps. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 56–72. Springer, Heidelberg (2004)Google Scholar
  19. 19.
    Camenisch, J., Michels, M.: Proving in Zero-Knowledge that a Number Is the Product of Two Safe Primes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 107–122. Springer, Heidelberg (1999)Google Scholar
  20. 20.
    Camenisch, J.L., Stadler, M.A.: Efficient Group Signature Schemes for Large Groups. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 410–424. Springer, Heidelberg (1997)Google Scholar
  21. 21.
    Chaum, D., Pedersen, T.P.: Wallet Databases with Observers. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 89–105. Springer, Heidelberg (1993)Google Scholar
  22. 22.
    Cramer, R., Damgård, I.B., Schoenmakers, B.: Proof of Partial Knowledge and Simplified Design of Witness Hiding Protocols. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 174–187. Springer, Heidelberg (1994)Google Scholar
  23. 23.
    Danezis, G., Livshits, B.: Towards ensuring client-side computational integrity. In: Cachin, C., Ristenpart, T. (eds.) CCSW, pp. 125–130. ACM (2011)Google Scholar
  24. 24.
    Díaz-Uriarte, R., De Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)CrossRefGoogle Scholar
  25. 25.
    Fiat, A., Shamir, A.: How to Prove Yourself: Practical Solutions to Identification and Signature Problems. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 186–194. Springer, Heidelberg (1987)Google Scholar
  26. 26.
    Fredrikson, M., Livshits, B.: Repriv: Re-imagining content personalization and in-browser privacy. In: IEEE Symposium on Security and Privacy, pp. 131–146. IEEE Computer Society (2011)Google Scholar
  27. 27.
    Fujisaki, E., Okamoto, T.: Statistical Zero Knowledge Protocols to Prove Modular Polynomial Relations. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 16–30. Springer, Heidelberg (1997)Google Scholar
  28. 28.
    Goldwasser, S., Micali, S., Rivest, R.: A digital signature scheme secure against adaptive chosen-message attacks. SIAM J. Comput. 17(2), 281–308 (1988)MathSciNetMATHCrossRefGoogle Scholar
  29. 29.
    Groth, J.: Non-interactive Zero-Knowledge Arguments for Voting. In: Ioannidis, J., Keromytis, A.D., Yung, M. (eds.) ACNS 2005. LNCS, vol. 3531, pp. 467–482. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Guha, S., Cheng, B., Francis, P.: Privad: Practical Privacy in Online Advertising. In: Proceedings of the 8th Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA (March 2011)Google Scholar
  31. 31.
    Juang, B.: Hidden markov models. Encyclopedia of Telecommunications (1985)Google Scholar
  32. 32.
    Karplus, K., Barrett, C., Hughey, R.: Hidden markov models for detecting remote protein homologies. Bioinformatics 14(10), 846–856 (1998)CrossRefGoogle Scholar
  33. 33.
    Kunz-Jacques, S., Martinet, G., Poupard, G., Stern, J.: Cryptanalysis of an Efficient Proof of Knowledge of Discrete Logarithm. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.) PKC 2006. LNCS, vol. 3958, pp. 27–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  34. 34.
    Lenstra, A.K., Hughes, J.P., Augier, M., Bos, J.W., Kleinjung, T., Wachter, C.: Ron was wrong, whit is right. Cryptology ePrint Archive, Report 2012/064 (2012), http://eprint.iacr.org/
  35. 35.
    Levine, B.N., Shields, C., Margolin, N.B.: A survey of solutions to the sybil attack (2006)Google Scholar
  36. 36.
    Magkos, E., Maragoudakis, M., Chrissikopoulos, V., Gritzalis, S.: Accurate and large-scale privacy-preserving data mining using the election paradigm. Data & Knowledge Engineering 68(11), 1224–1236 (2009)CrossRefGoogle Scholar
  37. 37.
    Pedersen, T.P.: Non-Interactive and Information-Theoretic Secure Verifiable Secret Sharing. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 129–140. Springer, Heidelberg (1992)Google Scholar
  38. 38.
    Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explorations 4(2), 12–19 (2002)CrossRefGoogle Scholar
  39. 39.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  40. 40.
    Reznichenko, A., Guha, S., Francis, P.: Auctions in Do-Not-Track Compliant Internet Advertising. In: Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS), Chicago, IL (October 2011)Google Scholar
  41. 41.
    Rial, A., Danezis, G.: Privacy-Preserving Smart Metering. In: Proceedings of the 11th ACM Workshop on Privacy in the Electronic Society (WPES 2011). ACM, Chicago (2011)Google Scholar
  42. 42.
    Schnorr, C.: Efficient signature generation for smart cards. Journal of Cryptology 4(3), 239–252 (1991)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., Barocas, S.: Adnostic: Privacy preserving targeted advertising. In: NDSS (2010)Google Scholar
  44. 44.
    Vaidya, J., Clifton, C., Kantarcioglu, M., Patterson, A.S.: Privacy-preserving decision trees over vertically partitioned data. TKDD 2(3) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • George Danezis
    • 1
  • Markulf Kohlweiss
    • 1
  • Benjamin Livshits
    • 1
  • Alfredo Rial
    • 2
  1. 1.Microsoft ResearchUSA
  2. 2.IBBT and KU Leuven, ESAT-COSICBelgium

Personalised recommendations