Skip to main content

Privacy-Preserving Cloud-Based Statistical Analyses on Sensitive Categorical Data

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9880))

Abstract

We consider the problem of privacy-preserving cloud-based statistical computation on sensitive categorical data. Specifically, we focus on protocols to obtain the contingency matrix and the sample covariance matrix of the categorical data set. A multi-cloud is used not only to store the sensitive data but also to perform computations on them. However, the multi-cloud is semi-honest, that is, it follows the protocols but is not authorized to learn the sensitive data. Hence, the data must be stored and computed on by the multi-cloud in a privacy-preserving format, which we choose to be vertical splitting among the various clouds. We give a comparison of our proposals, based on the secure scalar product, against a benchmark protocol consisting of downloading plus local computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, G., Bawa, M., Ganesan, P., Garcia-Molina, H., Kenthapadi, K., Motwani, R., Srivastava, U., Thomas, D., Xu, Y.: Two can keep a secret: a distributed architecture for secure database services. In: CIDR 2005, pp. 186–199 (2005)

    Google Scholar 

  2. Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  3. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Google Scholar 

  4. Calviño, A., Ricci, S., Domingo-Ferrer, J.: Privacy-preserving distributed statistical computation to a semi-honest multi-cloud. In: IEEE Conference on Communications and Network Security (CNS 2015). IEEE (2015)

    Google Scholar 

  5. Ciriani, V., di Vimercati, S.D.C., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Selective data outsourcing for enforcing privacy. J. Comput. Secur. 19(3), 531–566 (2011)

    Google Scholar 

  6. CLARUS - A Framework for User Centred Privacy and Security in the Cloud, H2020 project (2015–2017). http://www.clarussecure.eu

  7. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.: Tools for privacy preserving distributed data mining. ACM SiGKDD Explor. Newsl. 4(2), 28–34 (2002)

    Article  Google Scholar 

  8. Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)

    Article  Google Scholar 

  9. Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: SDM, SIAM, vol. 4, pp. 222–233 (2004)

    Google Scholar 

  10. Dubovitskaya, A., Urovi, V., Vasirani, M., Aberer, K., Schumacher, M.I.: A cloud-based ehealth architecture for privacy preserving data integration. In: Federrath, H., Gollmann, D., Chakravarthy, S.R. (eds.) SEC 2015. IFIP AICT, vol. 455, pp. 585–598. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18467-8_39

    Chapter  Google Scholar 

  11. Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Karr, A., Lin, X., Sanil, A., Reiter, J.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Off. Stat. 25(1), 125 (2009)

    Google Scholar 

  13. Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inform. 46(2), 294–303 (2013)

    Article  Google Scholar 

  14. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  15. Ren, K., Wang, C., Wang, Q.: Security challenges for the public cloud. IEEE Internet Comput. 16(1), 69–73 (2012)

    Article  Google Scholar 

  16. Rodríguez-García, M., Batet, M., Sánchez, D.: Semantic noise: privacy-protection of nominal microdata through uncorrelated noise addition. In: 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1106–1113 (2015)

    Google Scholar 

  17. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)

    Article  Google Scholar 

  18. Sánchez, D., Batet, M., Martínez, S., Domingo-Ferrer, J.: Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng. Appl. Artif. Intell. 39, 89–99 (2015)

    Article  Google Scholar 

  19. Shannon, C.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949)

    Article  MathSciNet  MATH  Google Scholar 

  20. Székely, G.J., Rizzo, M.L.: Brownian distance covariance. Ann. Appl. Stat. 3(4), 1236–1265 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. U.S. Federal Trade Commission: Data Brokers, A Call for Transparency and Accountability (2014)

    Google Scholar 

  22. Viejo, A., Sánchez, D., Castellà-Roca, J.: Preventing automatic user profiling in Web 2.0 applications. Knowl. Based Syst. 36, 191–205 (2012)

    Article  Google Scholar 

  23. Weiss, G.: Data mining in the real world: experiences, challenges, and recommendations. In: DMIN, pp. 124–130 (2009)

    Google Scholar 

  24. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)

    Article  Google Scholar 

Download references

Acknowledgments and Disclaimer

Partial support to this work has been received from the European Commission (projects H2020-644024 “CLARUS” and H2020-700540 “CANVAS”), from the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2014 SGR 537), and from the Spanish Government (project TIN2014-57364-C2-1-R “SmartGlacis” and TIN 2015-70054-REDC). The authors are with the UNESCO Chair in Data Privacy,but the views in this paper are the authors’ own and are not necessarily shared by UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Domingo-Ferrer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ricci, S., Domingo-Ferrer, J., Sánchez, D. (2016). Privacy-Preserving Cloud-Based Statistical Analyses on Sensitive Categorical Data. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2016. Lecture Notes in Computer Science(), vol 9880. Springer, Cham. https://doi.org/10.1007/978-3-319-45656-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45656-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45655-3

  • Online ISBN: 978-3-319-45656-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics