Privacy-Preserving Cloud-Based Statistical Analyses on Sensitive Categorical Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9880)


We consider the problem of privacy-preserving cloud-based statistical computation on sensitive categorical data. Specifically, we focus on protocols to obtain the contingency matrix and the sample covariance matrix of the categorical data set. A multi-cloud is used not only to store the sensitive data but also to perform computations on them. However, the multi-cloud is semi-honest, that is, it follows the protocols but is not authorized to learn the sensitive data. Hence, the data must be stored and computed on by the multi-cloud in a privacy-preserving format, which we choose to be vertical splitting among the various clouds. We give a comparison of our proposals, based on the secure scalar product, against a benchmark protocol consisting of downloading plus local computation.


Data splitting Privacy Categorical data Cloud computing Contingency tables Distance covariance 


Acknowledgments and Disclaimer

Partial support to this work has been received from the European Commission (projects H2020-644024 “CLARUS” and H2020-700540 “CANVAS”), from the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2014 SGR 537), and from the Spanish Government (project TIN2014-57364-C2-1-R “SmartGlacis” and TIN 2015-70054-REDC). The authors are with the UNESCO Chair in Data Privacy,but the views in this paper are the authors’ own and are not necessarily shared by UNESCO.


  1. 1.
    Aggarwal, G., Bawa, M., Ganesan, P., Garcia-Molina, H., Kenthapadi, K., Motwani, R., Srivastava, U., Thomas, D., Xu, Y.: Two can keep a secret: a distributed architecture for secure database services. In: CIDR 2005, pp. 186–199 (2005)Google Scholar
  2. 2.
    Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, Berlin (2011)CrossRefMATHGoogle Scholar
  3. 3.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)Google Scholar
  4. 4.
    Calviño, A., Ricci, S., Domingo-Ferrer, J.: Privacy-preserving distributed statistical computation to a semi-honest multi-cloud. In: IEEE Conference on Communications and Network Security (CNS 2015). IEEE (2015)Google Scholar
  5. 5.
    Ciriani, V., di Vimercati, S.D.C., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Selective data outsourcing for enforcing privacy. J. Comput. Secur. 19(3), 531–566 (2011)Google Scholar
  6. 6.
    CLARUS - A Framework for User Centred Privacy and Security in the Cloud, H2020 project (2015–2017).
  7. 7.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SiGKDD Explor. Newsl. 4(2), 28–34 (2002)CrossRefGoogle Scholar
  8. 8.
    Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)CrossRefGoogle Scholar
  9. 9.
    Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: SDM, SIAM, vol. 4, pp. 222–233 (2004)Google Scholar
  10. 10.
    Dubovitskaya, A., Urovi, V., Vasirani, M., Aberer, K., Schumacher, M.I.: A cloud-based ehealth architecture for privacy preserving data integration. In: Federrath, H., Gollmann, D., Chakravarthy, S.R. (eds.) SEC 2015. IFIP AICT, vol. 455, pp. 585–598. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-18467-8_39 CrossRefGoogle Scholar
  11. 11.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Karr, A., Lin, X., Sanil, A., Reiter, J.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Off. Stat. 25(1), 125 (2009)Google Scholar
  13. 13.
    Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)CrossRefGoogle Scholar
  14. 14.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  15. 15.
    Ren, K., Wang, C., Wang, Q.: Security challenges for the public cloud. IEEE Internet Comput. 16(1), 69–73 (2012)CrossRefGoogle Scholar
  16. 16.
    Rodríguez-García, M., Batet, M., Sánchez, D.: Semantic noise: privacy-protection of nominal microdata through uncorrelated noise addition. In: 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1106–1113 (2015)Google Scholar
  17. 17.
    Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)CrossRefGoogle Scholar
  18. 18.
    Sánchez, D., Batet, M., Martínez, S., Domingo-Ferrer, J.: Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng. Appl. Artif. Intell. 39, 89–99 (2015)CrossRefGoogle Scholar
  19. 19.
    Shannon, C.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Székely, G.J., Rizzo, M.L.: Brownian distance covariance. Ann. Appl. Stat. 3(4), 1236–1265 (2009)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    U.S. Federal Trade Commission: Data Brokers, A Call for Transparency and Accountability (2014)Google Scholar
  22. 22.
    Viejo, A., Sánchez, D., Castellà-Roca, J.: Preventing automatic user profiling in Web 2.0 applications. Knowl. Based Syst. 36, 191–205 (2012)CrossRefGoogle Scholar
  23. 23.
    Weiss, G.: Data mining in the real world: experiences, challenges, and recommendations. In: DMIN, pp. 124–130 (2009)Google Scholar
  24. 24.
    Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sara Ricci
    • 1
  • Josep Domingo-Ferrer
    • 1
  • David Sánchez
    • 1
  1. 1.UNESCO Chair in Data Privacy, Department of Computer Science and MathematicsUniversitat Rovira i VirgiliTarragonaCatalonia

Personalised recommendations