The VLDB Journal

, Volume 20, Issue 4, pp 567–588 | Cite as

Anonymity meets game theory: secure data integration with malicious participants

  • Noman Mohammed
  • Benjamin C. M. Fung
  • Mourad Debbabi
Regular Paper

Abstract

Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316–333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets.

Keywords

k-anonymity Secure data integration Privacy Classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam N.R., Wortman J.C.: Security control methods for statistical databases. ACM Comput. Surv. 21(4), 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Terzi, E.: On honesty in sovereign information sharing. In: Proceedings of the EDBT (2006)Google Scholar
  3. 3.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of ACM SIGMOD, San Diego, CA (2003)Google Scholar
  4. 4.
    Axelrod R.: The Evolution of Cooperation. Basic Books, New York (1984)Google Scholar
  5. 5.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE (2005)Google Scholar
  6. 6.
    Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: PODS (2005)Google Scholar
  7. 7.
    Brodsky A., Farkas C., Jajodia S.: Secure databases: Constraints, inference channels, and monitoring disclosures. IEEE Trans. Knowl. Data Eng. 12, 900–919 (2000)CrossRefGoogle Scholar
  8. 8.
    Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M.Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4(2), 28–34 (2002)CrossRefGoogle Scholar
  9. 9.
    Dayal U., Hwang H.Y.: View definition and generalization for database integration in a multidatabase systems. IEEE Trans. Softw. Eng. 10(6), 628–645 (1984)CrossRefGoogle Scholar
  10. 10.
    Denning D., Schlorer J.: Inference controls for statistical databases. IEEE Comput. 16(7), 69–82 (1983)Google Scholar
  11. 11.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: PODS (2003)Google Scholar
  12. 12.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Workshop on Privacy, Security, and Data Mining at the IEEE ICDM (2002)Google Scholar
  13. 13.
    Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the SIAM International Conference on Data Mining (SDM), Florida (2004)Google Scholar
  14. 14.
    Dwork, C.: Differential privacy. In: ICALP (2006)Google Scholar
  15. 15.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC (2006)Google Scholar
  16. 16.
    Farkas C., Jajodia S.: The inference problem: A survey. ACM SIGKDD Explor. Newsl. 4(2), 6–11 (2003)CrossRefGoogle Scholar
  17. 17.
    Fung B.C.M., Wang K., Yu P.S.: Anonymizing classification data for privacy preservation. IEEE TKDE 19(5), 711–725 (2007)Google Scholar
  18. 18.
    Fung B.C.M., Wang K., Chen R., Yu P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)CrossRefGoogle Scholar
  19. 19.
    Hinke, T.: Inference aggregation detection in database management systems. In: IEEE S&P (1988)Google Scholar
  20. 20.
    Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the Int’l Conference on Data Engineering (2008)Google Scholar
  21. 21.
    Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the EDBT (2010)Google Scholar
  22. 22.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD (2002)Google Scholar
  23. 23.
    Jiang, W., Clifton, C.: Privacy-preserving distributed k-anonymity. In: BDSec (2005)Google Scholar
  24. 24.
    Jiang W., Clifton C.: A secure distributed framework for achieving k-anonymity. Very Large Data Bases J. (VLDBJ) 15(4), 316–333 (2006)CrossRefGoogle Scholar
  25. 25.
    Jiang W., Clifton C., Kantarcioglu M.: Transforming semi-honest protocols to ensure accountability. Data Knowl. Eng. 65(1), 57–74 (2008)CrossRefGoogle Scholar
  26. 26.
    Jurczyk, P., Xiong, L.: Distributed anonymization: achieving privacy for both data subjects and data providers. In: DBSec (2009)Google Scholar
  27. 27.
    Kantarcioglu M., Kardes O.: Privacy-preserving data mining in the malicious model. Int. J. Inf. Comput. Secur. 2(4), 353–375 (2008)Google Scholar
  28. 28.
    Kantarcioglu, M., Xi, B., Clifton, C.: A game theoretical model for adversarial learning. In: Proceedings of the NGDM Workshop (2007)Google Scholar
  29. 29.
    Kardes, O., Kantarcioglu, M.: Privacy-preserving data mining applications in malicious model. In: Proceedings of the PADM Workshop (2007)Google Scholar
  30. 30.
    Kargupta, H., Das, K., Liu, K.: A game theoretic approach toward multi-party privacy-preserving distributed data mining. In: Proceedings of the PKDD (2007)Google Scholar
  31. 31.
    Kleinberg, J., Papadimitriou, C., Raghavan, P.: On the value of private information. In: TARK (2001)Google Scholar
  32. 32.
    Layfield, R., Kantarcioglu, M., Thuraisingham, B.: Incentive and trust issues in assured information sharing. In: Proceedings of the CollaborateComm (2008)Google Scholar
  33. 33.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: SIGKDD (2006)Google Scholar
  34. 34.
    Li, N., Li, T., Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and -diversity. In: ICDE (2007)Google Scholar
  35. 35.
    Lindell Y., Pinkas B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002)MathSciNetMATHCrossRefGoogle Scholar
  36. 36.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: -diversity: privacy beyond k-anonymity. ACM TKDD 1(1) (2007)Google Scholar
  37. 37.
    Malvestuto F.M., Mezzini M., Moscarini M.: Auditing sum- queries to make a statistical database secure. ACM Trans. Inf. Syst. Secur. 9(1), 31–60 (2006)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C.: Anonymizing healthcare data: a case study on the blood transfusion service. In: SIGKDD (2009a)Google Scholar
  39. 39.
    Mohammed, N., Fung, B.C.M., Wang, K., Hung, P.C.K.: Privacy-preserving data mashup. In: EDBT (2009b)Google Scholar
  40. 40.
    Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C. (2010) Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data (TKDD) 4(4), 18:1–18:33Google Scholar
  41. 41.
    Nash J.: Non-cooperative games. Ann. Math. 54(2), 286–295 (1951)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. http://archive.ics.uci.edu/ml/ (1998)
  43. 43.
    Nisan, N.: Algorithms for selfish agents. In: Proceedings of the STACS (1999)Google Scholar
  44. 44.
    Osborne M.J., Rubinstein A.: A Course in Game Theory. The MIT Press, Cambridge, UK (1994)MATHGoogle Scholar
  45. 45.
    Pinkas B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor. Newsl. 4(2), 12–19 (2002)CrossRefGoogle Scholar
  46. 46.
    Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)Google Scholar
  47. 47.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE TKDE 13(6), 1010–1027 (2001)Google Scholar
  48. 48.
    Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Proceedings of the DBSec (1998)Google Scholar
  49. 49.
    Sweeney L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002a)MathSciNetMATHCrossRefGoogle Scholar
  50. 50.
    Sweeney, L.: k-anonymity: a model for protecting privacy. In: International Journal on Uncertainty, Fuzziness and Knowledge-based Systems (2002b)Google Scholar
  51. 51.
    Thuraisingham, B.M.: Security checking in relational database management systems augmented with inference engines. Comput. Secur. 6(6), 479–492 (1987)Google Scholar
  52. 52.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the ACM SIGKDD (2002)Google Scholar
  53. 53.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ACM SIGKDD (2003)Google Scholar
  54. 54.
    Wang K., Fung B.C.M., Yu P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. KAIS 11(3), 345–368 (2007)Google Scholar
  55. 55.
    Wiederhold, G.: Intelligent integration of information. In: Proceedings of ACM SIGMOD, pp 434–437 (1993)Google Scholar
  56. 56.
    Wong, R.C.W., Li, J., Fu, A.W.C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD (2006)Google Scholar
  57. 57.
    Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: VLDB (2006)Google Scholar
  58. 58.
    Xiao, X., Yi, K., Tao, Y. The hardness and approximation algorithms for l-diversity. In: EDBT (2010)Google Scholar
  59. 59.
    Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the SDM (2005)Google Scholar
  60. 60.
    Yao, A.C.: Protocols for secure computations. In: Proceedings of the IEEE FOCS (1982)Google Scholar
  61. 61.
    Zhang, N., Zhao, W.: Distributed privacy preserving information sharing. In: Proceedings of the VLDB (2005)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Noman Mohammed
    • 1
  • Benjamin C. M. Fung
    • 1
  • Mourad Debbabi
    • 1
  1. 1.Concordia Institute for Information Systems EngineeringConcordia UniversityMontrealCanada

Personalised recommendations