Outsourcing analyses on privacy-protected multivariate categorical data stored in untrusted clouds

  • Josep Domingo-FerrerEmail author
  • David Sánchez
  • Sara Ricci
  • Mónica Muñoz-Batista
Regular Paper


Outsourcing data storage and computation to the cloud is appealing due to the cost savings it entails. However, when the data to be outsourced contain private information, appropriate protection mechanisms should be implemented by the data controller. Data splitting, which consists of fragmenting the data and storing them in separate clouds for the sake of privacy preservation, is an interesting alternative to encryption in terms of flexibility and efficiency. However, multivariate analyses on data split among various clouds are challenging, and they are even harder when data are nominal categorical (i.e., textual, non-ordinal), because the standard arithmetic operators cannot be used. In this article, we tackle the problem of outsourcing multivariate analyses on nominal data split over several honest-but-curious clouds. Specifically, we propose several secure protocols to outsource to multiple clouds the computation of a variety of multivariate analyses on nominal categorical data (frequency-based and semantic-based). Our protocols have been designed to outsource as much workload as possible to the clouds, in order to retain the cost-saving benefits of cloud computing while ensuring that the outsourced stay split and hence privacy-protected versus the clouds. The experiments we report on the Amazon cloud service show that by using our protocols the controller can save nearly all the runtime because it can integrate partial results received from the clouds with very little computation.


Cloud computing Data privacy Data splitting Nominal data 



Partial support to this work has been received from the European Commission (projects H2020-700540 “CANVAS” and H2020-644024 “CLARUS”), from the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer and grant 2017 SGR 705), and from the Spanish Government (projects RTI2018-095094-B-C21 “CONSENT” and TIN2016-80250-R “Sec-MCloud”). The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are the authors’ own and are not necessarily shared by UNESCO.

Supplementary material


  1. 1.
    Aggarwal G, Bawa M, Ganesan P, Garcia-Molina H, Kenthapadi K, Motwani R, Srivastava U, Thomas D, Xu Y (2005) Two can keep a secret: a distributed architecture for secure database services. CIDR 2005:186–199Google Scholar
  2. 2.
    Agresti A, Kateri M (2011) Categorical data analysis. Springer, BerlinzbMATHGoogle Scholar
  3. 3.
  4. 4.
    Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58CrossRefGoogle Scholar
  5. 5.
    Atallah MJ, Frikken KB (2010) Securely outsourcing linear algebra computations. In: 5th ACM symposium on information, computer and communications security—ASIACCS 2010, ACM, pp 48–59Google Scholar
  6. 6.
    Batet M, Harispe S, Ranwez S, Sánchez D, Ranwez V (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–2010CrossRefGoogle Scholar
  7. 7.
    Batet M, Sánchez D (2015) A review on semantic similarity. In: Encyclopedia of information science and technology, 3rd edn. IGI Global, pp 7575–7583Google Scholar
  8. 8.
    California patient discharge data: California Office of Statewide Health Planning and Development (OSHPD), 2009.
  9. 9.
    Calviño A, Ricci S, Domingo-Ferrer J (2015) Privacy-preserving distributed statistical computation to a semi-honest multi-cloud. In: IEEE conference on communications and network security (CNS 2015), IEEE, pp 506–514Google Scholar
  10. 10.
    Cimiano P (2006) Ontology learning and population from text: algorithms, evaluation and applications. Springer, BerlinGoogle Scholar
  11. 11.
    Ciriani V, De Capitani di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P (2011) Selective data outsourcing for enforcing privacy. J Comput Secur 19(3):531–566CrossRefGoogle Scholar
  12. 12.
    CLARUS—a Framework for user centred privacy and security in the cloud, H2020 project (2015–2017).
  13. 13.
    Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. ACM SiGKDD Explor Newsl 4(2):28–34CrossRefGoogle Scholar
  14. 14.
    Domingo-Ferrer J, Ricci S, Domingo-Enrich C (2018) Outsourcing scalar products and matrix products on privacy-protected unencrypted data stored in untrusted clouds. Inf Sci 436–437:320–342MathSciNetCrossRefGoogle Scholar
  15. 15.
    Domingo-Ferrer J, Sánchez D, Rufian-Torrell G (2013) Anonymization of nominal data based on semantic marginality. Inf Sci 242:35–48CrossRefGoogle Scholar
  16. 16.
    Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRefGoogle Scholar
  17. 17.
    Du W, Han Y, Chen S (2004) Privacy-preserving multivariate statistical analysis: linear regression and classification. In: SDM, vol 4. SIAM, pp 222–233Google Scholar
  18. 18.
    Dubovitskaya A, Urovi V, Vasirani M, Aberer K, Schumacher M (2015) A cloud-based eHealth architecture for privacy preserving data integration. In: ICT systems security and privacy protection, Springer, pp 585–598Google Scholar
  19. 19.
    Fu Z, Sun X, Ji S, Xie G (2016) Towards efficient content-aware search over encrypted outsourced data in cloud. In: Computer communications, IEEE INFOCOM 2016-the 35th annual IEEE international conference, IEEE, pp 1–9Google Scholar
  20. 20.
    General data protection regulation. European Union.
  21. 21.
    Ghattas B, Michel P, Boyer L (2017) Clustering nominal data using unsupervised binary decision trees: comparisons with the state of the art methods. Pattern Recognit 67:177–85CrossRefGoogle Scholar
  22. 22.
    Gelman A (2005) Analysis of variance—why it is more important than ever. Ann Stat 33(1):1–53MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Goethals B, Laur S , Lipmaa H, Mielikäinen T (2005) On private scalar product computation for privacy-preserving data mining. In: Information security and cryptology—ICISC 2004, LNCS, vol 3506, Springer, pp 104–120Google Scholar
  24. 24.
    Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte Nordholt E, Spicer K, De Wolf P-P (2006) Statistical disclosure control. Wiley, HobokenGoogle Scholar
  25. 25.
    Karr A, Lin X, Sanil A, Reiter J (2009) Privacy-preserving analysis of vertically partitioned data using secure matrix products. J Off Stat 25(1):125–138Google Scholar
  26. 26.
    Lei X, Liao X, Huang T, Li H, Hu C (2013) Outsourcing large matrix inversion computation to a public cloud. IEEE Trans Cloud Comput 1(1):78–87Google Scholar
  27. 27.
    Lei X, Liao X, Huang T, Heriniaina F (2014) Achieving security, robust cheating resistance, and high-efficiency for outsourcing large matrix multiplication computation to a malicious cloud. Inf Sci 280:205–217CrossRefGoogle Scholar
  28. 28.
    Li H, Yang Y, Luan TH, Liang X, Zhou L, Shen XS (2016) Enabling fine-grained multi-keyword search supporting classified sub-dictionaries over encrypted cloud data. IEEE Trans Dependable Secur Comput 13(3):312–25CrossRefGoogle Scholar
  29. 29.
    Li L, Lu R, Choo KK, Datta A, Shao J (2016) Privacy-preserving-outsourced association rule mining on vertically partitioned databases. IEEE Trans Inf Forensics Secur 11(8):1847–61CrossRefGoogle Scholar
  30. 30.
    Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning, ICML 1998, pp 296–304Google Scholar
  31. 31.
    Nassar M, Erradi A, Sabry F, Malluhi Q M (2014) Secure outsourcing of matrix operations as a service. In: IEEE CLOUD 2013, IEEE, pp 918–925Google Scholar
  32. 32.
    Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Advances in cryptology—EUROCRYPT ’99, LNCS, vol 1592, Springer, pp 223–238Google Scholar
  33. 33.
    Rada R, Mili H, Bichnell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 9:17–30CrossRefGoogle Scholar
  34. 34.
    Ren K, Wang C, Wang Q (2012) Security challenges for the public cloud. IEEE Internet Comput 16(1):69–73MathSciNetCrossRefGoogle Scholar
  35. 35.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI, vol 1, pp 448–453Google Scholar
  36. 36.
    Ricci S, Domingo-Ferrer J, Sánchez D (2016) Privacy-preserving cloud-based statistical analyses on sensitive categorical data. In: Modeling decisions for artificial intelligence, Springer, pp 227–238Google Scholar
  37. 37.
    Rodríguez-García M, Batet M, Sánchez D (2017) A semantic framework for noise addition with nominal data. Knowl Based Syst 112:103–118CrossRefGoogle Scholar
  38. 38.
    Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
  39. 39.
    Sánchez D, Batet M (2017) Privacy-preserving data outsourcing in the cloud via semantic data splitting. Comput Commun 110:187–201CrossRefGoogle Scholar
  40. 40.
    Sánchez D, Batet M, Isern D, Valls A (2012) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39(9):7718–7728CrossRefGoogle Scholar
  41. 41.
    Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303CrossRefGoogle Scholar
  42. 42.
    Sánchez D, Batet M, Martínez S, Domingo-Ferrer J (2015) Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng Appl Artif Intell 39:89–99CrossRefGoogle Scholar
  43. 43.
  44. 44.
    Sun Y, Yu Y, Li X, Zhang K, Qian H, Zhou Y (2016) Batch verifiable computation with public verifiability for outsourcing polynomials and matrix computations. In: Australasian conference on information security and privacy—ACISP 2016, Lecture Notes in Computer Science, vol 9722, Springer, pp 293–309Google Scholar
  45. 45.
    Székely GJ, Rizzo ML (2009) Brownian distance covariance. Ann Appl Stat 3(4):1236–1265MathSciNetzbMATHCrossRefGoogle Scholar
  46. 46.
    Taha A, Hadi AS (2016) Pair-wise association measures for categorical and mixed data. Inf Sci 346:73–89CrossRefGoogle Scholar
  47. 47.
    Tugrul B, Polat H (2014) Privacy-preserving kriging interpolation on partitioned data. Knowl Based Syst 62:38–46zbMATHCrossRefGoogle Scholar
  48. 48.
    U.S. Federal Trade Commission: Data Brokers, A Call for Transparency and Accountability (2014)Google Scholar
  49. 49.
    Wang I-C, Shen C-H, Hsu T-S, Liao C-C, Wang DW, Zhan J (2009) Towards empirical aspects of secure scalar product. IEEE Trans Syst Man Cybern Part C 39(4):440–447CrossRefGoogle Scholar
  50. 50.
    Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the annual meeting of the association for computational linguistics, pp 133–139Google Scholar
  51. 51.
    Xia Z, Wang X, Sun X, Wangm Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–52CrossRefGoogle Scholar
  52. 52.
    Yang JJ, Li JQ, Niu Y (2015) A hybrid solution for privacy preserving medical data sharing in the cloud environment. Future Gener Comput Syst 43:74–86CrossRefGoogle Scholar
  53. 53.
    Zhang X, Boscardin WJ, Belin TR, Wan X, He Y, Zhang K (2015) A Bayesian method for analyzing combinations of continuous, ordinal, and nominal categorical data with missing values. J Multivar Anal 135:43–58MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and Mathematics, UNESCO Chair in Data Privacy, CYBERCAT-Center for Cybersecurity Research of CataloniaUniversitat Rovira i VirgiliTarragonaCatalonia
  2. 2.Department of TelecommunicationsBrno University of TechnologyBrnoCzech Republic

Personalised recommendations