A Peer-to-Peer Protocol and System Architecture for Privacy-Preserving Statistical Analysis

  • Katerina Zamani
  • Angelos Charalambidis
  • Stasinos Konstantopoulos
  • Maria Dagioglou
  • Vangelis Karkaletsis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9817)

Abstract

The insights gained by the large-scale analysis of health-related data can have an enormous impact in public health and medical research, but access to such personal and sensitive data poses serious privacy implications for the data provider and a heavy data security and administrative burden on the data consumer. In this paper we present an architecture that fills the gap between the statistical tools ubiquitously used in medical research on the one hand, and privacy-preserving data mining methods on the other. This architecture foresees the primitive instructions needed to re-implement the elementary statistical methods so that they only access data via a privacy-preserving protocol. The advantage is that more complex analysis and visualisation tools that are built upon these elementary methods can remain unaffected. Furthermore, we introduce RASSP, a secure summation protocol that implements the primitive instructions foreseen by the architecture. An open-source reference implementation of this architecture is provided for the R language. We use these results to argue that the tension between medical research and privacy requirements can be technically alleviated and we outline a research plan towards a system that covers further requirements on computation efficiency and on the trust that the medical researcher can place on the statistical results obtained by it.

Keywords

Privacy-preserving statistical analysis Secure summation protocol Statistical processing of health records 

References

  1. 1.
    Ajmani, S., Morris, R., Liskov, B.: A trusted third-party computation service. Technical report, MIT-LCS-TR-847, MIT (2001)Google Scholar
  2. 2.
    Benaloh, J.C.: Secret sharing homomorphisms: keeping shares of a secret secret. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 251–260. Springer, Heidelberg (1987)CrossRefGoogle Scholar
  3. 3.
    Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19: Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS 2007), Vancouver, BC, Canada, pp. 281–288. MIT Press, 3-5 December 2007Google Scholar
  4. 4.
    Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining privacy for data mining. In: Proceedings of the National Science Foundation Workshop on Next Generation Data Mining, Baltimore, USA, 1–3 November 2002Google Scholar
  5. 5.
    Hanmanthu, B., Ram, B.R., Niranjan, P.: Third party privacy preserving protocol for perturbation based classification of vertically fragmented data bases (2013). arXiv preprint arXiv:1304.6575
  6. 6.
    Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016)CrossRefGoogle Scholar
  7. 7.
    Horvitz, E., Mulligan, D.: Data, privacy, and the greater good. Sci. Mag. 349, 253–255 (2015)MathSciNetGoogle Scholar
  8. 8.
    Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure regression on distributed databases. J. Comput. Graph. Stat. 14(2), 263–279 (2005)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kearns, M.: Efficient noise-tolerant learning from statistical queries. J. ACM (JACM) 45(6), 983–1006 (1998)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Kerschbaum, F.: Privacy-preserving computation. In: Preneel, B., Ikonomou, D. (eds.) APF 2012. LNCS, vol. 8319, pp. 41–54. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  11. 11.
    Kieseberg, P., Malle, B., Frühwirt, P., Weippl, E., Holzinger, A.: A tamper-proof audit and control system for the doctor in the loop. Brain Inform. 1–11 (2016)Google Scholar
  12. 12.
    Kissner, L., Song, D.: Privacy-preserving set operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD 2009), pp. 19–30. ACM (2009)Google Scholar
  14. 14.
    Molina, A.D., Salajegheh, M., Fu, K.: HICCUPS: health information collaborative collection using privacy and security. In: Proceedings of the First ACM Workshop on Security and Privacy in Medical and Home-Care Systems (SPIMACS 2009), pp. 21–30. ACM (2009)Google Scholar
  15. 15.
    Ohm, P.: Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 57, 1701 (2010)Google Scholar
  16. 16.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  17. 17.
    Sheikh, R., Kumar, B., Mishra, D.K.: Privacy preserving k secure sum protocol (2009). arXiv preprint arXiv:0912.0956
  18. 18.
    Shi, E., Chan, T.H., Rieffel, E., Chow, R., Song, D.: Privacy-preserving aggregation of time-series data. In: Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS 2011), vol. 2, pp. 1–17 (2011)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Katerina Zamani
    • 1
  • Angelos Charalambidis
    • 1
  • Stasinos Konstantopoulos
    • 1
  • Maria Dagioglou
    • 1
  • Vangelis Karkaletsis
    • 1
  1. 1.Institute and Informatics and Telecommunications, NCSR ‘Demokritos’Agia ParaskeviGreece

Personalised recommendations