What’s the Gist? Privacy-Preserving Aggregation of User Profiles

  • Igor Bilogrevic
  • Julien Freudiger
  • Emiliano De Cristofaro
  • Ersin Uzun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8713)


Over the past few years, online service providers have started gathering increasing amounts of personal information to build user profiles and monetize them with advertisers and data brokers. Users have little control of what information is processed and are often left with an all-or-nothing decision between receiving free services or refusing to be profiled. This paper explores an alternative approach where users only disclose an aggregate model – the “gist” – of their data. We aim to preserve data utility and simultaneously provide user privacy. We show that this approach can be efficiently supported by letting users contribute encrypted and differentially-private data to an aggregator. The aggregator combines encrypted contributions and can only extract an aggregate model of the underlying data. We evaluate our framework on a dataset of 100,000 U.S. users obtained from the U.S. Census Bureau and show that (i) it provides accurate aggregates with as little as 100 users, (ii) it can generate revenue for both users and data brokers, and (iii) its overhead is appreciably low.


Privacy Secure Computation Differential Privacy User Profiling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ComRes: Big Brother Watch Online Privacy Survey (2013), http://www.comres.co.uk/polls/Big_Brother_Watch_Online_Privacy_Survey.pdf
  2. 2.
    Flood, G.: Online Privacy Worries Increasing Worldwide. InformationWeek (2013), http://www.informationweek.co.uk/security/privacy/online-privacy-worries-increasing-worldw/240153200
  3. 3.
    Tanzina Vega, E.W.: U.s. agency seeks tougher consumer privacy rules. The New York Times (2012), http://nyti.ms/GQQCrY
  4. 4.
    Wyatt, E.: U.S. Penalizes Online Company in Sale of Personal Data. The New York Times (2012), http://nyti.ms/OsDrgI
  5. 5.
    Gellman, B., Poitras, L.: US Intelligence Mines Data from Internet Firms in Secret Program. The Washington Post (2013), http://wapo.st/J2gkLY
  6. 6.
    Greenwald, G., MacAskill, E.: NSA Prism program taps in to user data of Apple, Google and others. The Guardian (2013), http://www.theguardian.com/world/2013/jun/06/us-tech-giants-nsa-data
  7. 7.
    Natasha: Congress to examine data sellers. The New York Times (2012), http://nyti.ms/Pewbq1
  8. 8.
    Malheiros, M., Preibusch, S., Sasse, M.A.: “Fairly truthful”: The impact of Perceived Effort, Fairness, Relevance, and Sensitivity on Personal Data Disclosure. In: Huth, M., Asokan, N., Čapkun, S., Flechais, I., Coles-Kemp, L. (eds.) TRUST 2013. LNCS, vol. 7904, pp. 250–266. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Tunner, A.: Bizarro World of Hilarious Mistakes Revealed In Long Secret Personal Data Files Just Opened. Forbes, http://onforb.es/1rZ5PZQ (2013)
  10. 10.
    Carrascal, J.P., Riederer, C., Erramilli, V., Cherubini, M., de Oliveira, R.: Your Browsing Behavior for a Big Mac: Economics of Personal Information Online. In: WWW (2013)Google Scholar
  11. 11.
    Singel, R.: Encrypted E-Mail Company Hushmail Spills to Feds (2007), http://www.wired.com/threatlevel/2007/11/encrypted-e-mai/
  12. 12.
    Guha, S., Cheng, B., Francis, P.: Privad: practical privacy in online advertising. In: NSDI (2011)Google Scholar
  13. 13.
    Mohan, P., Nath, S., Riva, O.: Prefetching Mobile Ads: Can Advertising Systems Afford It? In: EuroSys (2013)Google Scholar
  14. 14.
    Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., Barocas, S.: Adnostic: Privacy preserving targeted advertising. In: NDSS (2010)Google Scholar
  15. 15.
    Backes, M., Kate, A., Maffei, M., Pecina, K.: Obliviad: Provably secure and practical online behavioral advertising. In: IEEE Security and Privacy (2012)Google Scholar
  16. 16.
    Riederer, C., Erramilli, V., Chaintreau, A., Krishnamurthy, B., Rodriguez, P.: For Sale: Your Data: By: You. In: HotNets (2011)Google Scholar
  17. 17.
    Akkus, I.E., Chen, R., Hardt, M., Francis, P., Gehrke, J.: Non-tracking Web Analytics. In: ACM CCS (2012)Google Scholar
  18. 18.
    Chen, R., Akkus, I.E., Francis, P.: SplitX: High-performance Private Analytics. In: ACM SIGCOMM (2013)Google Scholar
  19. 19.
    Chen, R., Reznichenko, A., Francis, P., Gehrke, J.: Towards statistical queries over distributed private user data. In: NSDI (2012)Google Scholar
  20. 20.
    Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: IEEE Security and Privacy (2008)Google Scholar
  21. 21.
    Lin, J.: Divergence measures based on the shannon entropy. IEEE TIT 37(1) (1991)Google Scholar
  22. 22.
    Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases. In: KDD (1995)Google Scholar
  23. 23.
    Aperjis, C., Huberman, B.A.: A market for unbiased private data: Paying individuals according to their privacy attitudes. ArXiv Report 1205.0030 (2012)Google Scholar
  24. 24.
    Kumaraguru, P., Cranor, L.F.: Privacy Indexes: A Survey of Westins Studies. Institute for Software Research International (2005)Google Scholar
  25. 25.
    Singer, N.: Mapping, and sharing, the consumer genome. The New York Times (2012), http://nyti.ms/LcBw0g
  26. 26.
    DataCommons: Partner Organizations Helping to Advance Healthcare (2014), http://mydatacommons.org
  27. 27.
    Shi, E., Chan, T.H.H., Rieffel, E.G., Chow, R., Song, D.: Privacy-Preserving Aggregation of Time-Series Data. In: NDSS (2011)Google Scholar
  28. 28.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)Google Scholar
  29. 29.
    Laplace, P.S.: Mémoire sur les approximations des formules qui sont fonctions de très-grands nombres, et sur leur application aux probabilités. Baudouin (1810)Google Scholar
  30. 30.
    Rice, J.A.: Mathematical statistics and data analysis. Wadsworth & Brooks/Cole (1988)Google Scholar
  31. 31.
    U.S. Census Bureau: DataFerrett Analysis and Extraction Tool, http://dataferrett.census.gov
  32. 32.
    U.S. Government: The home of the U.S. Government’s open data, www.data.gov
  33. 33.
    Hamilton, H.J., Hilderman, R.J., Cercone, N.: Attribute-oriented induction using domain generalization graphs. In: IEEE ICTAI (1996)Google Scholar
  34. 34.
    Hilderman, R.J., Hamilton, H.J., Barber, B.: Ranking the interestingness of summaries from data mining systems. In: FLAIRS Conference (1999)Google Scholar
  35. 35.
    Olejnik, L., Minh-Dung, T., Castelluccia, C.: Selling Off Privacy at Auction. In: NDSS (2014)Google Scholar
  36. 36.
    McCallum, A.K.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu
  37. 37.
    Yahoo Labs: Webscope, http://webscope.sandbox.yahoo.com
  38. 38.
    Pollard, J.M.: Monte carlo methods for index computation. Mathematics of Computation 32(143) (1978)Google Scholar
  39. 39.
    Blum, M., Feldman, P., Micali, S.: Non-interactive zero-knowledge and its applications. In: ACM STOC (1988)Google Scholar
  40. 40.
    Boudot, F.: Efficient proofs that a committed number lies in an interval. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 431–444. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  41. 41.
    Lipmaa, H.: On diophantine complexity and statistical zero-knowledge arguments. In: Laih, C.-S. (ed.) ASIACRYPT 2003. LNCS, vol. 2894, pp. 398–415. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  42. 42.
    Erkin, Z., Tsudik, G.: Private computation of spatial and temporal power consumption with smart meters. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 561–577. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  43. 43.
    Shi, J., Zhang, R., Liu, Y., Zhang, Y.: Prisense: Privacy-preserving Data Aggregation in People-Centric Urban Sensing Systems. In: IEEE INFOCOM (2010)Google Scholar
  44. 44.
    Xing, K., Wan, Z., Hu, P., Zhu, H., Wang, Y., Chen, X., Wang, Y., Huang, L.: Mutual privacy-preserving regression modeling in participatory sensing. In: IEEE INFOCOM (2013)Google Scholar
  45. 45.
    Paillier, P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 233–238. Springer, Heidelberg (1999)Google Scholar
  46. 46.
    Ahmadi, H., Pham, N., Ganti, R., Abdelzaher, T., Nath, S., Han, J.: Privacy-aware regression modeling of participatory sensing dataGoogle Scholar
  47. 47.
    Chan, T.-H.H., Shi, E., Song, D.: Privacy-preserving stream aggregation with fault tolerance. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 200–214. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Igor Bilogrevic
    • 1
  • Julien Freudiger
    • 2
  • Emiliano De Cristofaro
    • 3
  • Ersin Uzun
    • 2
  1. 1.GoogleSwitzerland
  2. 2.PARCUSA
  3. 3.University College LondonUK

Personalised recommendations