Privacy-Preserving Cloud-Based Statistical Analyses on Sensitive Categorical Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9880)

Abstract

We consider the problem of privacy-preserving cloud-based statistical computation on sensitive categorical data. Specifically, we focus on protocols to obtain the contingency matrix and the sample covariance matrix of the categorical data set. A multi-cloud is used not only to store the sensitive data but also to perform computations on them. However, the multi-cloud is semi-honest, that is, it follows the protocols but is not authorized to learn the sensitive data. Hence, the data must be stored and computed on by the multi-cloud in a privacy-preserving format, which we choose to be vertical splitting among the various clouds. We give a comparison of our proposals, based on the secure scalar product, against a benchmark protocol consisting of downloading plus local computation.

Keywords

Data splitting Privacy Categorical data Cloud computing Contingency tables Distance covariance 

References

  1. 1.
    Aggarwal, G., Bawa, M., Ganesan, P., Garcia-Molina, H., Kenthapadi, K., Motwani, R., Srivastava, U., Thomas, D., Xu, Y.: Two can keep a secret: a distributed architecture for secure database services. In: CIDR 2005, pp. 186–199 (2005)Google Scholar
  2. 2.
    Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, Berlin (2011)CrossRefMATHGoogle Scholar
  3. 3.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)Google Scholar
  4. 4.
    Calviño, A., Ricci, S., Domingo-Ferrer, J.: Privacy-preserving distributed statistical computation to a semi-honest multi-cloud. In: IEEE Conference on Communications and Network Security (CNS 2015). IEEE (2015)Google Scholar
  5. 5.
    Ciriani, V., di Vimercati, S.D.C., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Selective data outsourcing for enforcing privacy. J. Comput. Secur. 19(3), 531–566 (2011)Google Scholar
  6. 6.
    CLARUS - A Framework for User Centred Privacy and Security in the Cloud, H2020 project (2015–2017). http://www.clarussecure.eu
  7. 7.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SiGKDD Explor. Newsl. 4(2), 28–34 (2002)CrossRefGoogle Scholar
  8. 8.
    Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)CrossRefGoogle Scholar
  9. 9.
    Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: SDM, SIAM, vol. 4, pp. 222–233 (2004)Google Scholar
  10. 10.
    Dubovitskaya, A., Urovi, V., Vasirani, M., Aberer, K., Schumacher, M.I.: A cloud-based ehealth architecture for privacy preserving data integration. In: Federrath, H., Gollmann, D., Chakravarthy, S.R. (eds.) SEC 2015. IFIP AICT, vol. 455, pp. 585–598. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18467-8_39 CrossRefGoogle Scholar
  11. 11.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Karr, A., Lin, X., Sanil, A., Reiter, J.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Off. Stat. 25(1), 125 (2009)Google Scholar
  13. 13.
    Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)CrossRefGoogle Scholar
  14. 14.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  15. 15.
    Ren, K., Wang, C., Wang, Q.: Security challenges for the public cloud. IEEE Internet Comput. 16(1), 69–73 (2012)CrossRefGoogle Scholar
  16. 16.
    Rodríguez-García, M., Batet, M., Sánchez, D.: Semantic noise: privacy-protection of nominal microdata through uncorrelated noise addition. In: 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1106–1113 (2015)Google Scholar
  17. 17.
    Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)CrossRefGoogle Scholar
  18. 18.
    Sánchez, D., Batet, M., Martínez, S., Domingo-Ferrer, J.: Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng. Appl. Artif. Intell. 39, 89–99 (2015)CrossRefGoogle Scholar
  19. 19.
    Shannon, C.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Székely, G.J., Rizzo, M.L.: Brownian distance covariance. Ann. Appl. Stat. 3(4), 1236–1265 (2009)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    U.S. Federal Trade Commission: Data Brokers, A Call for Transparency and Accountability (2014)Google Scholar
  22. 22.
    Viejo, A., Sánchez, D., Castellà-Roca, J.: Preventing automatic user profiling in Web 2.0 applications. Knowl. Based Syst. 36, 191–205 (2012)CrossRefGoogle Scholar
  23. 23.
    Weiss, G.: Data mining in the real world: experiences, challenges, and recommendations. In: DMIN, pp. 124–130 (2009)Google Scholar
  24. 24.
    Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sara Ricci
    • 1
  • Josep Domingo-Ferrer
    • 1
  • David Sánchez
    • 1
  1. 1.UNESCO Chair in Data Privacy, Department of Computer Science and MathematicsUniversitat Rovira i VirgiliTarragonaCatalonia

Personalised recommendations