# Secure and efficient anonymization of distributed confidential databases

## Abstract

Let us consider the following situation: \(t\) entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper.

## Keywords

Statistical disclosure control Distributed computation Database security ElGamal cryptosystem## Notes

### Acknowledgments

Partial support by the Spanish program CONSOLIDER-INGENIO 2010, under project ARES (CSD2007-00004) is acknowledged. Javier Herranz enjoys a *Ramón y Cajal* grant, partially funded by the European Social Fund (ESF), from Spanish MINECO Ministry. The work of Jordi Nin is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557, and by the BSC-CNS Severo Ochoa program (SEV-2011-00067).

## References

- 1.Akinyele, J.A., Garman, C., Miers, I., Pagano, M.W., Rushanan, M., Green, M., Rubin, A.D.: Charm: a framework for rapidly prototyping cryptosystems. J. Cryptogr. Eng.
**3**(2), 111–128 (2013)CrossRefGoogle Scholar - 2.Beimel, A., Nissim, K., Omri, E.: Distributed private data analysis: simultaneously solving How and What. In: CRYPTO’08, Volume 5157 of Lecture Notes in Computer Science, pp. 451–468. Springer (2008)Google Scholar
- 3.Brickell, J., Shmatikov, V.: Efficient anonymity preserving data collection. In: ACM SIGKDD, pp. 334–343. ACM Press (2006)Google Scholar
- 4.Bunn, P., Ostrovsky, R.: Secure two-party \(k\)-means clustering. In: Proceedings of ACM Conference on Computer and Communications Security, pp. 486–497. ACM Press (2007)Google Scholar
- 5.Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. (PVLDB)
**4**(11), 1087–1098 (2011)Google Scholar - 6.Dalenius, T., Reiss, S.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference
**6**, 73–85 (1982)MathSciNetCrossRefMATHGoogle Scholar - 7.Damgard, I., Fitzi, M., Kiltz, E., Nielsen, J., Toft, T.: Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In: Theory of Cryptography Conference, Volume 3876 of Lecture Notes in Computer Science, pp. 285–304. Springer (2006)Google Scholar
- 8.Defays, D., Anwar, M.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Seminar on Statistical Confidentiality, pp. 69–78. (1995)Google Scholar
- 9.Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci.
**180**(15), 2834–2844 (2010)CrossRefGoogle Scholar - 10.Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. Very Large Database J.
**15**, 355–369 (2006)CrossRefGoogle Scholar - 11.Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng.
**14**(1), 189–201 (2002)CrossRefGoogle Scholar - 12.Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. (2001)Google Scholar
- 13.Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. (2001)Google Scholar
- 14.Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Systematic measures of re-identification risk based on the probabilistic links of the partially synthetic data back to the original microdata. Tech. Rep. Cornell Univ. (2005)Google Scholar
- 15.Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Eurocrypt’06, Volume 4004 of Lecture Notes in Computer Science, pp. 486–503. Springer (2006)Google Scholar
- 16.Dwork, C.: Differential privacy. In: ICALP’06 (2), Volume 4052 of Lecture Notes in Computer Science, pp. 1–12. Springer (2006)Google Scholar
- 17.Dwork, C.: A firm foundation for private data analysis. Commun. ACM
**54**(1), 86–95 (2011)CrossRefGoogle Scholar - 18.ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory
**31**, 469–472 (1985)MathSciNetCrossRefMATHGoogle Scholar - 19.Gennaro, R., Jarecki, S., Krawczyk, H., Rabin, T.: Secure distributed key generation for discrete-log based cryptosystems. J. Cryptol.
**20**, 51–83 (2007)MathSciNetCrossRefMATHGoogle Scholar - 20.Heer, G.: A bootstrap procedure to preserve statistical confidentiality in contingency tables. In: Proceedings of the 1st International Seminar on Statistical Confidentiality, pp. 261–71. (1993)Google Scholar
- 21.Herranz, J., Nin, J., Torra, V.: Distributed privacy-preserving methods for statistical disclosure control. In: Data Privacy Management and Autonomous Spontaneous Security, Volume 5939 of Lecture Notes in Computer Science, pp. 33–47. Springer (2010)Google Scholar
- 22.Kim, J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA Section on Survey Research, Methodology, pp. 303–308. (1986)Google Scholar
- 23.Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng.
**17**(7), 902–911 (2005)CrossRefGoogle Scholar - 24.Li, N., Qardaji, W., Su, D.: On sampling, anonymization, and differential privacy: or, \(k\)-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (ASIACCS), pp. 32–33. (2012)Google Scholar
- 25.Mateo-Sanz, J.M., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Disc.
**11**(2), 181–193 (2005)Google Scholar - 26.Muralidhar, K., Sarathy, R.: Data shuffling- a new masking approach for numerical data. Manage. Sci.
**52**(2), 658–670 (2006)CrossRefGoogle Scholar - 27.Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowl. Eng.
**64**(1), 346–364 (2008)CrossRefGoogle Scholar - 28.Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Volume 1592 of Lecture Notes in Computer Science, pp. 223–238. Springer (1999)Google Scholar
- 29.Samatari, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Tech. Rep. SRI Int. Tech. Rep. (1998)Google Scholar
- 30.Sarathy, R., Muralidhar, K.: Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv.
**4**(1), 1–17 (2011)MathSciNetGoogle Scholar - 31.Soria, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via \(k\)-anonymity. In Proceedings of TrustCom/ISPA/IUCC, pp. 372–379. (2013)Google Scholar
- 32.Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst.
**10**(5), 557–570 (2002)MathSciNetCrossRefMATHGoogle Scholar - 33.Willenborg, L., de Waal, T.: Elements of Statistical Diclosure Control. In: Lecture Notes in Statistics. Springer (2001)Google Scholar
- 34.Zhong, S., Yang, Z., Chen, T.: \(k\)-Anonymous data collection. Inf. Sci.
**179**(17), 2948–2963 (2009)MathSciNetCrossRefMATHGoogle Scholar