Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Privacy-Preserving Data Analytics

  • Do Le Quoc
  • Martin Beck
  • Pramod Bhatotia
  • Ruichuan Chen
  • Christof Fetzer
  • Thorsten Strufe
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_152-1

Abstract

Real-time processing of user data streams in online services inadvertently creates tension between the users and analysts: users are looking for stronger privacy, while analysts desire for higher utility data analytics in real time. To resolve this tension, this paper describes the design, implementation, and evaluation of PrivApprox, a data analytics system for privacy-preserving stream processing. PrivApprox provides three important properties: (i) privacy, zero-knowledge privacy guarantee for users, a privacy bound tighter than the state-of-the-art differential privacy; (ii) utility, an interface for data analysts to systematically explore the trade-offs between the output accuracy (with error estimation) and the query execution budget; and (iii) latency, near real-time stream processing based on a scalable “synchronization-free” distributed architecture. The key idea behind PrivApprox is to combine two techniques together, namely, sampling (used for approximate computation) and randomized response (used for privacy-preserving analytics). The resulting combination is complementary – it achieves stronger privacy guarantees and also improves the performance for stream analytics.

This is a preview of subscription content, log in to check access.

References

  1. Al-Kateb M, Lee BS (2010) Stratified reservoir sampling over heterogeneous data streams. In: Proceedings of the 22nd international conference on scientific and statistical database management (SSDBM)Google Scholar
  2. Apache spark streaming. http://spark.apache.org/streaming. Accessed Nov 2017
  3. Bhatotia P (2015) Incremental parallel and distributed systems. PhD thesis, Max Planck Institute for Software Systems (MPI-SWS)Google Scholar
  4. Bhatotia P, Wieder A, Akkus IE, Rodrigues R, Acar UA (2011a) Large-scale incremental data processing with change propagation. In: Proceedings of the conference on hot topics in cloud computing (HotCloud)Google Scholar
  5. Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquini R (2011b) Incoop: MapReduce for incremental computations. In: Proceedings of the ACM symposium on cloud computing (SoCC)Google Scholar
  6. Bhatotia P, Dischinger M, Rodrigues R, Acar UA (2012a) Slider: incremental sliding-window computations for large-scale data analysis. Technical Report MPI-SWS-2012-004, MPI-SWS. http://www.mpi-sws.org/tr/2012-004.pdf
  7. Bhatotia P, Rodrigues R, Verma A (2012b) Shredder: GPU-accelerated incremental storage and computation. In: Proceedings of USENIX conference on file and storage technologies (FAST)Google Scholar
  8. Bhatotia P, Acar UA, Junqueira FP, Rodrigues R (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference (Middleware)Google Scholar
  9. Bhatotia P, Fonseca P, Acar UA, Brandenburg B, Rodrigues R (2015) iThreads: a threading library for parallel incremental computation. In: Proceedings of the 20th international conference on architectural support for programming languages and operating systems (ASPLOS)Google Scholar
  10. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4)Google Scholar
  11. Chan THH, Shi E, Song D (2011) Private and continual release of statistics. ACM Trans Inf Syst Secur 14(3), 26Google Scholar
  12. Chan THH, Li M, Shi E, Xu W (2012) Differentially private continual monitoring of heavy hitters from distributed streams. In: Proceedings of the 12th international conference on privacy enhancing technologies (PETS)Google Scholar
  13. Chaudhuri K, Mishra N (2006) When random sampling preserves privacy. In: Proceedings of the 26th annual international conference on advances in cryptology (CRYPTO)Google Scholar
  14. Chen R, Akkus IE, Francis P (2013) SplitX: high-performance private analytics. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM)Google Scholar
  15. Cormode G, Garofalakis M, Haas PJ, Jermaine C (2012) Synopses for massive data: samples, histograms, wavelets, sketches. Found Trends Databases 4(1–3):1–294Google Scholar
  16. Dingledine R, Mathewson N, Syverson P (2004) Tor: the second-generation onion router. Technical report, DTIC DocumentGoogle Scholar
  17. Douceur JR (2002) The Sybil attack. In: Proceedings of 1st international workshop on peer-to-peer systems (IPTPS)Google Scholar
  18. Dwork C (2006) Differential privacy. In: Proceedings of the 33rd international colloquium on automata, languages and programming, part II (ICALP)Google Scholar
  19. Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006a) Our data, ourselves: privacy via distributed noise generation. In: Proceedings of the 24th annual international conference on the theory and applications of cryptographic techniques (EUROCRYPT)Google Scholar
  20. Dwork C, McSherry F, Nissim K, Smith A (2006b) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the third conference on theory of cryptography (TCC)Google Scholar
  21. Dwork C, Naor M, Pitassi T, Rothblum GN (2010) Differential privacy under continual observation. In: Proceedings of the ACM symposium on theory of computing (STOC)Google Scholar
  22. Fox JA, Tracy PE (1986) Randomized response: a method for sensitive surveys. Sage Publications, Beverly HillsGoogle Scholar
  23. Gehrke J, Lui E, Pass R (2011) Towards privacy for social networks: a zero-knowledge based definition of privacy. In: Theory of cryptographyGoogle Scholar
  24. Gehrke J, Hay M, Lui E, Pass R (2012) Crowd-blending privacy. In: Proceedings of the 32nd annual international conference on advances in cryptology (CRYPTO)Google Scholar
  25. Guha S, Cheng B, Francis P (2011) Privad: practical privacy in online advertising. In: Proceedings of the 8th symposium on networked systems design and implementation (NSDI)Google Scholar
  26. Hellerstein JM, Haas PJ, Wang HJ (1997) Online aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD)Google Scholar
  27. HealthCare.gov sends personal data to dozens of tracking websites. https://www.eff.org/deeplinks/2015/01/healthcare.gov-sends-personal-data. Accessed Nov 2017
  28. Hubert Chan Th, Shi E, Song D (2012) Privacy-preserving stream aggregation with fault tolerance. In: Proceedings of 16th international conference on financial cryptography and data security (FC)Google Scholar
  29. Krishnan DR, Quoc DL, Bhatotia P, Fetzer C, Rodrigues R (2016) IncApprox: a data analytics system for incremental approximate computing. In: Proceedings of the 25th international conference on world wide web (WWW)Google Scholar
  30. McSherry F, Mahajan R (2010) Differentially-private network trace analysis. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM)Google Scholar
  31. Mohan P, Thakurta A, Shi E, Song D, Culler D (2012) GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data (SIGMOD)Google Scholar
  32. Moore DS (1999) The basic practice of statistics, 2nd edn. W. H. Freeman & Co., New YorkGoogle Scholar
  33. Quoc DL, Beck M, Bhatotia P, Chen R, Fetzer C, Strufe T (2017a) Privacy preserving stream analytics: the marriage of randomized response and approximate computing. https://arxiv.org/abs/1701.05403
  34. Quoc DL, Beck M, Bhatotia P, Chen R, Fetzer C, Strufe T (2017b) PrivApprox: privacy-preserving stream analytics. In: Proceedings of the 2017 USENIX conference on USENIX annual technical conference (USENIX ATC)Google Scholar
  35. Quoc DL, Chen R, Bhatotia P, Fetzer C, Hilt V, Strufe T (2017c) Approximate stream analytics in Apache Flink and Apache Spark streaming. CoRR, abs/1709.02946Google Scholar
  36. Quoc DL, Chen R, Bhatotia P, Fetzer C, Hilt V, Strufe T (2017d) StreamApprox: approximate computing for stream analytics. In: Proceedings of the international middleware conference (Middleware)Google Scholar
  37. Rastogi V, Nath S (2010) Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the international conference on management of data (SIGMOD)Google Scholar
  38. SEC Charges Two Employees of a Credit Card Company with Insider Trading. http://www.sec.gov/litigation/litreleases/2015/lr23179.htm. Accessed Nov 2017
  39. Shi E, Chan TH, Rieffel EG, Chow R, Song D (2011) Privacy-preserving aggregation of time-series data. In: Proceedings of the symposium on network and distributed system security (NDSS)Google Scholar
  40. Wang G, Wang B, Wang T, Nika A, Zheng H, Zhao BY (2016a) Defending against Sybil devices in crowdsourced mapping services. In: Proceedings of the 14th annual international conference on mobile systems, applications, and services (MobiSys)Google Scholar
  41. Wang Q, Zhang Y, Lu X, Wang Z, Qin Z, Ren K (2016b) RescueDP: real-time spatio-temporal crowd-sourced data publishing with differential privacy. In: Proceedings of the 35th annual IEEE international conference on computer communications (INFOCOM)Google Scholar
  42. Wieder A, Bhatotia P, Post A, Rodrigues R (2010a) Brief announcement: modelling mapreduce for optimal execution in the cloud. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on principles of distributed computing (PODC)Google Scholar
  43. Wieder A, Bhatotia P, Post A, Rodrigues R (2010b) Conductor: orchestrating the clouds. In: Proceedings of the 4th international workshop on large scale distributed systems and middleware (LADIS)Google Scholar
  44. Wieder A, Bhatotia P, Post A, Rodrigues R (2012) Orchestrating the deployment of computations in the cloud with conductor. In: Proceedings of the 9th USENIX symposium on networked systems design and implementation (NSDI)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Do Le Quoc
    • 1
  • Martin Beck
    • 1
  • Pramod Bhatotia
    • 2
  • Ruichuan Chen
    • 3
  • Christof Fetzer
    • 1
  • Thorsten Strufe
    • 1
  1. 1.TU DresdenDresdenGermany
  2. 2.University of Edinburgh and Alan Turing InstituteEdinburghUK
  3. 3.Nokia Bell LabsNew JerseyUSA

Section editors and affiliations

  • Asterios Katsifodimos
    • 1
  • Pramod Bhatotia
    • 2
  1. 1.Delft University of TechnologyDelftNetherlands
  2. 2.School of InformaticsUniversity of EdinburghEdinburghUnited Kingdom