International Journal of Information Security

, Volume 13, Issue 6, pp 497–512 | Cite as

Secure and efficient anonymization of distributed confidential databases

  • Javier Herranz
  • Jordi Nin
Regular Contribution


Let us consider the following situation: \(t\) entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.


Statistical disclosure control Distributed computation Database security ElGamal cryptosystem 



Partial support by the Spanish program CONSOLIDER-INGENIO 2010, under project ARES (CSD2007-00004) is acknowledged. Javier Herranz enjoys a Ramón y Cajal grant, partially funded by the European Social Fund (ESF), from Spanish MINECO Ministry. The work of Jordi Nin is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557, and by the BSC-CNS Severo Ochoa program (SEV-2011-00067).


  1. 1.
    Akinyele, J.A., Garman, C., Miers, I., Pagano, M.W., Rushanan, M., Green, M., Rubin, A.D.: Charm: a framework for rapidly prototyping cryptosystems. J. Cryptogr. Eng. 3(2), 111–128 (2013)CrossRefGoogle Scholar
  2. 2.
    Beimel, A., Nissim, K., Omri, E.: Distributed private data analysis: simultaneously solving How and What. In: CRYPTO’08, Volume 5157 of Lecture Notes in Computer Science, pp. 451–468. Springer (2008)Google Scholar
  3. 3.
    Brickell, J., Shmatikov, V.: Efficient anonymity preserving data collection. In: ACM SIGKDD, pp. 334–343. ACM Press (2006)Google Scholar
  4. 4.
    Bunn, P., Ostrovsky, R.: Secure two-party \(k\)-means clustering. In: Proceedings of ACM Conference on Computer and Communications Security, pp. 486–497. ACM Press (2007)Google Scholar
  5. 5.
    Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. (PVLDB) 4(11), 1087–1098 (2011)Google Scholar
  6. 6.
    Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Damgard, I., Fitzi, M., Kiltz, E., Nielsen, J., Toft, T.: Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In: Theory of Cryptography Conference, Volume 3876 of Lecture Notes in Computer Science, pp. 285–304. Springer (2006)Google Scholar
  8. 8.
    Defays, D., Anwar, M.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Seminar on Statistical Confidentiality, pp. 69–78. (1995)Google Scholar
  9. 9.
    Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180(15), 2834–2844 (2010)CrossRefGoogle Scholar
  10. 10.
    Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. Very Large Database J. 15, 355–369 (2006)CrossRefGoogle Scholar
  11. 11.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRefGoogle Scholar
  12. 12.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. (2001)Google Scholar
  13. 13.
    Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. (2001)Google Scholar
  14. 14.
    Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Systematic measures of re-identification risk based on the probabilistic links of the partially synthetic data back to the original microdata. Tech. Rep. Cornell Univ. (2005)Google Scholar
  15. 15.
    Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Eurocrypt’06, Volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer (2006)Google Scholar
  16. 16.
    Dwork, C.: Differential privacy. In: ICALP’06 (2), Volume 4052 of Lecture Notes in Computer Science, pp. 1–12. Springer (2006)Google Scholar
  17. 17.
    Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)CrossRefGoogle Scholar
  18. 18.
    ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 469–472 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Gennaro, R., Jarecki, S., Krawczyk, H., Rabin, T.: Secure distributed key generation for discrete-log based cryptosystems. J. Cryptol. 20, 51–83 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Heer, G.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Proceedings of the 1st International Seminar on Statistical Confidentiality, pp. 261–71. (1993)Google Scholar
  21. 21.
    Herranz, J., Nin, J., Torra, V.: Distributed privacy-preserving methods for statistical disclosure control. In: Data Privacy Management and Autonomous Spontaneous Security, Volume 5939 of Lecture Notes in Computer Science, pp. 33–47. Springer (2010)Google Scholar
  22. 22.
    Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research, Methodology, pp. 303–308. (1986)Google Scholar
  23. 23.
    Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)CrossRefGoogle Scholar
  24. 24.
    Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: or, \(k\)-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS), pp. 32–33. (2012)Google Scholar
  25. 25.
    Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc. 11(2), 181–193 (2005)Google Scholar
  26. 26.
    Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Manage. Sci. 52(2), 658–670 (2006)CrossRefGoogle Scholar
  27. 27.
    Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowl. Eng. 64(1), 346–364 (2008)CrossRefGoogle Scholar
  28. 28.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer (1999)Google Scholar
  29. 29.
    Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Tech. Rep. SRI Int. Tech. Rep. (1998)Google Scholar
  30. 30.
    Sarathy, R., Muralidhar, K.: Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4(1), 1–17 (2011)MathSciNetGoogle Scholar
  31. 31.
    Soria, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via \(k\)-anonymity. In Proceedings of TrustCom/ISPA/IUCC, pp. 372–379. (2013)Google Scholar
  32. 32.
    Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Willenborg, L., de Waal, T.: Elements of Statistical Diclosure Control. In: Lecture Notes in Statistics. Springer (2001)Google Scholar
  34. 34.
    Zhong, S., Yang, Z., Chen, T.: \(k\)-Anonymous data collection. Inf. Sci. 179(17), 2948–2963 (2009)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Departament de Matemàtica Aplicada 4Universitat Politècnica de Catalunya - BarcelonaTechBarcelonaSpain
  2. 2.Barcelona Supercomputing Center -BSCUniversitat Politècnica de Catalunya - BarcelonaTechBarcelonaSpain

Personalised recommendations