Methods to Mitigate Risk of Composition Attack in Independent Data Publications

  • Jiuyong Li
  • Sarowar A. Sattar
  • Muzammil M. Baig
  • Jixue Liu
  • Raymond Heatherly
  • Qiang Tang
  • Bradley Malin

Abstract

Data publication is a simple and cost-effective approach for data sharing across organizations. Data anonymization is a central technique in privacy preserving data publications. Many methods have been proposed to anonymize individual datasets and multiple datasets of the same data publisher. In real life, a dataset is rarely isolated and two datasets published by two organizations may contain the records of the same individuals. For example, patients might have visited two hospitals for follow-up or specialized treatment regarding a disease, and their records are independently anonymized and published. Although each published dataset poses a small privacy risk, the intersection of two datasets may severely compromise the privacy of the individuals. The attack using the intersection of datasets published by different organizations is called a composition attack. Some research work has been done to study methods for anonymizing data to prevent a composition attack for independent data releases where one data publisher has no knowledge of records of another data publisher. In this chapter, we discuss two exemplar methods, a randomization based and a generalization based approaches, to mitigate risks of composition attacks. In the randomization method, noise is added to the original values to make it difficult for an adversary to pinpoint an individual’s record in a published dataset. In the generalization method, a group of records according to potentially identifiable attributes are generalized to the same so that individuals are indistinguishable. We discuss and experimentally demonstrate the strengths and weaknesses of both types of methods. We also present a mixed data publication framework where a small proportion of the records are managed and published centrally and other records are managed and published locally in different organizations to reduce the risk of the composition attack and improve the overall utility of the data.

References

  1. 1.
    Baig, M.M., Li, J., Liu, J., Ding, X., Wang, H.: Data privacy against composition attack. In: Proceedings of the 17th International Conference on Database Systems for Advanced Applications, pp. 320–334, Busan (2012)Google Scholar
  2. 2.
    Baig, M.M., Li, J., Liu, J., Wang, H.: Cloning for privacy protection in multiple independent data publications. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 885–894, Glasgow (2011)Google Scholar
  3. 3.
    Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 273–282, Beijing (2007)Google Scholar
  4. 4.
    Bu, Y., Fu, A.W., Wong, R.C.W., Chen, L., Li, J.: Privacy preserving serial data publishing by role composition. Proc. VLDB Endowment 1(1), 845–856 (2008)CrossRefGoogle Scholar
  5. 5.
    Cebul, R.D., Rebitzer, J.B., Taylor, L.J., Votruba, M.: Organizational fragmentation and care quality in the U.S. health care system. J. Econ. Perspect. (2008). doi:10.3386/w14212 Google Scholar
  6. 6.
    Collaboration, P.S.: Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 360(9349), 1903–1913 (2002)CrossRefGoogle Scholar
  7. 7.
    Cormode, G., Procopiuc, C.M., Shen, E., Srivastava, D., Yu, T.: Empirical privacy and empirical utility of anonymized data. In: Proceedings of the 29th IEEE International Conference on Data Engineering Workshops, pp. 77–82, Brisbane (2013)Google Scholar
  8. 8.
    Dwork, C.: Differential privacy. In: Proceedings of the 5th International Colloquium on Automata, Languages and Programming, pp. 1–12, Venice (2006)Google Scholar
  9. 9.
    Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)CrossRefGoogle Scholar
  10. 10.
    Dwork, C., Kenthapadi, K., Mcsherry, F., Mironov, I., Naor, M.: Our data, ourselves: privacy via distributed noise generation. In: Advances in Cryptology - EUROCRYPT, pp. 486–503, St. Petersburg (2006)Google Scholar
  11. 11.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography, pp. 265–284, Berlin (2006)Google Scholar
  12. 12.
    Dwork, C., Smith, A.: Differential privacy for statistics: what we know and what we want to learn. J. Priv. Confidentiality 1(2), 135–154 (2009)Google Scholar
  13. 13.
    Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 1–53 (2010)CrossRefGoogle Scholar
  14. 14.
    Fung, B.C.M., Wang, K., Fu, A.W., Pei, J.: Anonymity for continuous data publishing. In: Proceeding of the 11th International Conference on Extending Database Technology, pp. 264–275, Nantes (2008)Google Scholar
  15. 15.
    Ganta, S.R., Prasad, S., Smith, A.: Composition attacks and auxiliary information in data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 265–273, Las Vegas, Nevada (2008)Google Scholar
  16. 16.
    Jiang, W., Clifton, C.: A secure distributed framework for achieving k-anonymity. VLDB J. 15(4), 316–333 (2006)CrossRefGoogle Scholar
  17. 17.
    Jurczyk, P., Xiong, L.: Towards privacy-preserving integration of distributed heterogeneous data. In: Proceedings of the 2nd Ph.D. Workshop on Information and Knowledge Management, pp. 65–72, Napa Valley, California (2008)Google Scholar
  18. 18.
    Kasiviswanathan, S.P., Smith, A.: On the ‘semantics’ of differential privacy: a Bayesian formulation. Priv. Confidentiality 6(1), 1–16 (2014)Google Scholar
  19. 19.
    Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz, N.: Big data, analytics, and the path from insights to value. MIT Sloan Manag. Rev. 52, 21–31 (2011)Google Scholar
  21. 21.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering, pp. 25–25, Atlanta, Georgia (2006)Google Scholar
  22. 22.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd IEEE International Conference on Data Engineering, pp. 106–115, Istanbul (2007)Google Scholar
  23. 23.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1) (2007). doi: 10.1145/1217299.1217302
  24. 24.
    Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37, 179–192 (2004)CrossRefGoogle Scholar
  25. 25.
    Mohammed, N., Fung, B.C.M., Wang, K., Hung, P.C.K.: Privacy-preserving data mashup. In: Proceeding of the 12th International Conference on Extending Database Technology, pp. 228–239, Saint Petersburg (2009)Google Scholar
  26. 26.
    Mohammed, N., Chen, R., Fung, B.C., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–501, San Diego, California (2011)Google Scholar
  27. 27.
    Muralidhar, K., Sarathy, R.: Does differential privacy protect Terry Gross privacy? In: Domingo-Ferrer, J., Magkos, E., (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 6344, pp. 200–209. Springer, Berlin (2010)CrossRefGoogle Scholar
  28. 28.
    Newton, K.M., Peissig, P.L., Kho, A.N., Bielinski, S.J., Berg, R.L., Choudhary, V., Basford, M., Chute, C.G., Kullo, I.J., Li, R., Pacheco, J.A., Rasmussen, L.V., Spangler, L., Denny, J.C.: Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J. Am. Med. Inform. Assoc. 20(e1), e147–e154 (2013)CrossRefGoogle Scholar
  29. 29.
    Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Big Data 1(1), 51–59 (2013)CrossRefGoogle Scholar
  30. 30.
    Sarathy, R., Muralidhar, K.: Evaluating Laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4(1), 1–17 (2011)MathSciNetGoogle Scholar
  31. 31.
    Sattar, S.A., Li, J., Ding, X., Liu, J., Vincent, M.: A general framework for privacy preserving data publishing. Knowl.-Based Syst. 54(0), 276–287 (2013)Google Scholar
  32. 32.
    Sattar, S.A., Li, J., Liu, J., Heatherly, R., Malin, B.: A probabilistic approach to mitigate composition attacks on privacy in non-coordinated environments. Knowl.-Based Syst. 67(0), 361–372 (2014)Google Scholar
  33. 33.
    Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J. 23(5), 771–794 (2014)CrossRefGoogle Scholar
  34. 34.
    Sweeney, L.: k-anonymity: A model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10(5), 557–570 (2002)Google Scholar
  35. 35.
    Tene, O., Polonetsky, J.: Privacy in the age of big data: a time for big decisions. Stanford Law Rev. 64, 63–69 (2012)Google Scholar
  36. 36.
    Wang, K., Fung, B.C.M.: Anonymizing sequential releases. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 414–423, Philadelphia, PA (2006)Google Scholar
  37. 37.
    Wong, R.C., Li, J., Fu, A.W., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759, Philadelphia, PA (2006)Google Scholar
  38. 38.
    Wong, R.C., Fu, A.W., Liu, J., Wang, K., Xu, Y.: Global privacy guarantee in serial data publishing. In: Proceedings of 26th IEEE International Conference on Data Engineering, pp. 956–959, Long Beach, California (2010)Google Scholar
  39. 39.
    Xiao, X., Tao, Y.: m-invariance: towards privacy preserving re-publication of dynamic data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 689–700, Beijing (2007)Google Scholar
  40. 40.
    Xiao, X., Wang, G., Gehrke, Gehrke, J., Jefferson, T.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)Google Scholar
  41. 41.
    Xiong, L., Sunderam, V., Fan, L., Goryczka, S., Pournajaf, L.: Predict: privacy and security enhancing dynamic information collection and monitoring. Procedia Comput. Sci. 18(0), 1979–1988 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jiuyong Li
    • 1
  • Sarowar A. Sattar
    • 1
  • Muzammil M. Baig
    • 2
  • Jixue Liu
    • 1
  • Raymond Heatherly
    • 3
  • Qiang Tang
    • 4
  • Bradley Malin
    • 5
  1. 1.School of Information Technology and Mathematical SciencesUniversity of South AustraliaAdelaideAustralia
  2. 2.InterSect Alliance International Pty LtdAdelaide AreaAustralia
  3. 3.Department of Biomedical InformaticsVanderbilt UniversityNashvilleUSA
  4. 4.APSIA group, SnTUniversity of LuxembourgWalferdangeLuxembourg
  5. 5.Departments of Biomedical Informatics and EE and CSVanderbilt UniversityNashvilleUSA

Personalised recommendations