Abstract
Over the past few years, online service providers have started gathering increasing amounts of personal information to build user profiles and monetize them with advertisers and data brokers. Users have little control of what information is processed and are often left with an all-or-nothing decision between receiving free services or refusing to be profiled. This paper explores an alternative approach where users only disclose an aggregate model – the “gist” – of their data. We aim to preserve data utility and simultaneously provide user privacy. We show that this approach can be efficiently supported by letting users contribute encrypted and differentially-private data to an aggregator. The aggregator combines encrypted contributions and can only extract an aggregate model of the underlying data. We evaluate our framework on a dataset of 100,000 U.S. users obtained from the U.S. Census Bureau and show that (i) it provides accurate aggregates with as little as 100 users, (ii) it can generate revenue for both users and data brokers, and (iii) its overhead is appreciably low.
Chapter PDF
Similar content being viewed by others
References
ComRes: Big Brother Watch Online Privacy Survey (2013), http://www.comres.co.uk/polls/Big_Brother_Watch_Online_Privacy_Survey.pdf
Flood, G.: Online Privacy Worries Increasing Worldwide. InformationWeek (2013), http://www.informationweek.co.uk/security/privacy/online-privacy-worries-increasing-worldw/240153200
Tanzina Vega, E.W.: U.s. agency seeks tougher consumer privacy rules. The New York Times (2012), http://nyti.ms/GQQCrY
Wyatt, E.: U.S. Penalizes Online Company in Sale of Personal Data. The New York Times (2012), http://nyti.ms/OsDrgI
Gellman, B., Poitras, L.: US Intelligence Mines Data from Internet Firms in Secret Program. The Washington Post (2013), http://wapo.st/J2gkLY
Greenwald, G., MacAskill, E.: NSA Prism program taps in to user data of Apple, Google and others. The Guardian (2013), http://www.theguardian.com/world/2013/jun/06/us-tech-giants-nsa-data
Natasha: Congress to examine data sellers. The New York Times (2012), http://nyti.ms/Pewbq1
Malheiros, M., Preibusch, S., Sasse, M.A.: “Fairly truthful”: The impact of Perceived Effort, Fairness, Relevance, and Sensitivity on Personal Data Disclosure. In: Huth, M., Asokan, N., Čapkun, S., Flechais, I., Coles-Kemp, L. (eds.) TRUST 2013. LNCS, vol. 7904, pp. 250–266. Springer, Heidelberg (2013)
Tunner, A.: Bizarro World of Hilarious Mistakes Revealed In Long Secret Personal Data Files Just Opened. Forbes, http://onforb.es/1rZ5PZQ (2013)
Carrascal, J.P., Riederer, C., Erramilli, V., Cherubini, M., de Oliveira, R.: Your Browsing Behavior for a Big Mac: Economics of Personal Information Online. In: WWW (2013)
Singel, R.: Encrypted E-Mail Company Hushmail Spills to Feds (2007), http://www.wired.com/threatlevel/2007/11/encrypted-e-mai/
Guha, S., Cheng, B., Francis, P.: Privad: practical privacy in online advertising. In: NSDI (2011)
Mohan, P., Nath, S., Riva, O.: Prefetching Mobile Ads: Can Advertising Systems Afford It? In: EuroSys (2013)
Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum, H., Barocas, S.: Adnostic: Privacy preserving targeted advertising. In: NDSS (2010)
Backes, M., Kate, A., Maffei, M., Pecina, K.: Obliviad: Provably secure and practical online behavioral advertising. In: IEEE Security and Privacy (2012)
Riederer, C., Erramilli, V., Chaintreau, A., Krishnamurthy, B., Rodriguez, P.: For Sale: Your Data: By: You. In: HotNets (2011)
Akkus, I.E., Chen, R., Hardt, M., Francis, P., Gehrke, J.: Non-tracking Web Analytics. In: ACM CCS (2012)
Chen, R., Akkus, I.E., Francis, P.: SplitX: High-performance Private Analytics. In: ACM SIGCOMM (2013)
Chen, R., Reznichenko, A., Francis, P., Gehrke, J.: Towards statistical queries over distributed private user data. In: NSDI (2012)
Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: IEEE Security and Privacy (2008)
Lin, J.: Divergence measures based on the shannon entropy. IEEE TIT 37(1) (1991)
Feldman, R., Dagan, I.: Knowledge Discovery in Textual Databases. In: KDD (1995)
Aperjis, C., Huberman, B.A.: A market for unbiased private data: Paying individuals according to their privacy attitudes. ArXiv Report 1205.0030 (2012)
Kumaraguru, P., Cranor, L.F.: Privacy Indexes: A Survey of Westins Studies. Institute for Software Research International (2005)
Singer, N.: Mapping, and sharing, the consumer genome. The New York Times (2012), http://nyti.ms/LcBw0g
DataCommons: Partner Organizations Helping to Advance Healthcare (2014), http://mydatacommons.org
Shi, E., Chan, T.H.H., Rieffel, E.G., Chow, R., Song, D.: Privacy-Preserving Aggregation of Time-Series Data. In: NDSS (2011)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Laplace, P.S.: Mémoire sur les approximations des formules qui sont fonctions de très-grands nombres, et sur leur application aux probabilités. Baudouin (1810)
Rice, J.A.: Mathematical statistics and data analysis. Wadsworth & Brooks/Cole (1988)
U.S. Census Bureau: DataFerrett Analysis and Extraction Tool, http://dataferrett.census.gov
U.S. Government: The home of the U.S. Government’s open data, www.data.gov
Hamilton, H.J., Hilderman, R.J., Cercone, N.: Attribute-oriented induction using domain generalization graphs. In: IEEE ICTAI (1996)
Hilderman, R.J., Hamilton, H.J., Barber, B.: Ranking the interestingness of summaries from data mining systems. In: FLAIRS Conference (1999)
Olejnik, L., Minh-Dung, T., Castelluccia, C.: Selling Off Privacy at Auction. In: NDSS (2014)
McCallum, A.K.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu
Yahoo Labs: Webscope, http://webscope.sandbox.yahoo.com
Pollard, J.M.: Monte carlo methods for index computation. Mathematics of Computation 32(143) (1978)
Blum, M., Feldman, P., Micali, S.: Non-interactive zero-knowledge and its applications. In: ACM STOC (1988)
Boudot, F.: Efficient proofs that a committed number lies in an interval. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 431–444. Springer, Heidelberg (2000)
Lipmaa, H.: On diophantine complexity and statistical zero-knowledge arguments. In: Laih, C.-S. (ed.) ASIACRYPT 2003. LNCS, vol. 2894, pp. 398–415. Springer, Heidelberg (2003)
Erkin, Z., Tsudik, G.: Private computation of spatial and temporal power consumption with smart meters. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 561–577. Springer, Heidelberg (2012)
Shi, J., Zhang, R., Liu, Y., Zhang, Y.: Prisense: Privacy-preserving Data Aggregation in People-Centric Urban Sensing Systems. In: IEEE INFOCOM (2010)
Xing, K., Wan, Z., Hu, P., Zhu, H., Wang, Y., Chen, X., Wang, Y., Huang, L.: Mutual privacy-preserving regression modeling in participatory sensing. In: IEEE INFOCOM (2013)
Paillier, P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 233–238. Springer, Heidelberg (1999)
Ahmadi, H., Pham, N., Ganti, R., Abdelzaher, T., Nath, S., Han, J.: Privacy-aware regression modeling of participatory sensing data
Chan, T.-H.H., Shi, E., Song, D.: Privacy-preserving stream aggregation with fault tolerance. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 200–214. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bilogrevic, I., Freudiger, J., De Cristofaro, E., Uzun, E. (2014). What’s the Gist? Privacy-Preserving Aggregation of User Profiles. In: Kutyłowski, M., Vaidya, J. (eds) Computer Security - ESORICS 2014. ESORICS 2014. Lecture Notes in Computer Science, vol 8713. Springer, Cham. https://doi.org/10.1007/978-3-319-11212-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-11212-1_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11211-4
Online ISBN: 978-3-319-11212-1
eBook Packages: Computer ScienceComputer Science (R0)