Anonymity meets game theory: secure data integration with malicious participants
- 310 Downloads
- 17 Citations
Abstract
Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316–333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets.
Keywords
k-anonymity Secure data integration Privacy ClassificationPreview
Unable to display preview. Download preview PDF.
References
- 1.Adam N.R., Wortman J.C.: Security control methods for statistical databases. ACM Comput. Surv. 21(4), 515–556 (1989)CrossRefGoogle Scholar
- 2.Agrawal, R., Terzi, E.: On honesty in sovereign information sharing. In: Proceedings of the EDBT (2006)Google Scholar
- 3.Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of ACM SIGMOD, San Diego, CA (2003)Google Scholar
- 4.Axelrod R.: The Evolution of Cooperation. Basic Books, New York (1984)Google Scholar
- 5.Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE (2005)Google Scholar
- 6.Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the SuLQ framework. In: PODS (2005)Google Scholar
- 7.Brodsky A., Farkas C., Jajodia S.: Secure databases: Constraints, inference channels, and monitoring disclosures. IEEE Trans. Knowl. Data Eng. 12, 900–919 (2000)CrossRefGoogle Scholar
- 8.Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M.Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4(2), 28–34 (2002)CrossRefGoogle Scholar
- 9.Dayal U., Hwang H.Y.: View definition and generalization for database integration in a multidatabase systems. IEEE Trans. Softw. Eng. 10(6), 628–645 (1984)CrossRefGoogle Scholar
- 10.Denning D., Schlorer J.: Inference controls for statistical databases. IEEE Comput. 16(7), 69–82 (1983)Google Scholar
- 11.Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: PODS (2003)Google Scholar
- 12.Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Workshop on Privacy, Security, and Data Mining at the IEEE ICDM (2002)Google Scholar
- 13.Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the SIAM International Conference on Data Mining (SDM), Florida (2004)Google Scholar
- 14.Dwork, C.: Differential privacy. In: ICALP (2006)Google Scholar
- 15.Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC (2006)Google Scholar
- 16.Farkas C., Jajodia S.: The inference problem: A survey. ACM SIGKDD Explor. Newsl. 4(2), 6–11 (2003)CrossRefGoogle Scholar
- 17.Fung B.C.M., Wang K., Yu P.S.: Anonymizing classification data for privacy preservation. IEEE TKDE 19(5), 711–725 (2007)Google Scholar
- 18.Fung B.C.M., Wang K., Chen R., Yu P.S.: Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42(4), 14:1–14:53 (2010)CrossRefGoogle Scholar
- 19.Hinke, T.: Inference aggregation detection in database management systems. In: IEEE S&P (1988)Google Scholar
- 20.Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the Int’l Conference on Data Engineering (2008)Google Scholar
- 21.Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the EDBT (2010)Google Scholar
- 22.Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD (2002)Google Scholar
- 23.Jiang, W., Clifton, C.: Privacy-preserving distributed k-anonymity. In: BDSec (2005)Google Scholar
- 24.Jiang W., Clifton C.: A secure distributed framework for achieving k-anonymity. Very Large Data Bases J. (VLDBJ) 15(4), 316–333 (2006)CrossRefGoogle Scholar
- 25.Jiang W., Clifton C., Kantarcioglu M.: Transforming semi-honest protocols to ensure accountability. Data Knowl. Eng. 65(1), 57–74 (2008)CrossRefGoogle Scholar
- 26.Jurczyk, P., Xiong, L.: Distributed anonymization: achieving privacy for both data subjects and data providers. In: DBSec (2009)Google Scholar
- 27.Kantarcioglu M., Kardes O.: Privacy-preserving data mining in the malicious model. Int. J. Inf. Comput. Secur. 2(4), 353–375 (2008)Google Scholar
- 28.Kantarcioglu, M., Xi, B., Clifton, C.: A game theoretical model for adversarial learning. In: Proceedings of the NGDM Workshop (2007)Google Scholar
- 29.Kardes, O., Kantarcioglu, M.: Privacy-preserving data mining applications in malicious model. In: Proceedings of the PADM Workshop (2007)Google Scholar
- 30.Kargupta, H., Das, K., Liu, K.: A game theoretic approach toward multi-party privacy-preserving distributed data mining. In: Proceedings of the PKDD (2007)Google Scholar
- 31.Kleinberg, J., Papadimitriou, C., Raghavan, P.: On the value of private information. In: TARK (2001)Google Scholar
- 32.Layfield, R., Kantarcioglu, M., Thuraisingham, B.: Incentive and trust issues in assured information sharing. In: Proceedings of the CollaborateComm (2008)Google Scholar
- 33.LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: SIGKDD (2006)Google Scholar
- 34.Li, N., Li, T., Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and ℓ-diversity. In: ICDE (2007)Google Scholar
- 35.Lindell Y., Pinkas B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
- 36.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. ACM TKDD 1(1) (2007)Google Scholar
- 37.Malvestuto F.M., Mezzini M., Moscarini M.: Auditing sum- queries to make a statistical database secure. ACM Trans. Inf. Syst. Secur. 9(1), 31–60 (2006)MathSciNetCrossRefGoogle Scholar
- 38.Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C.: Anonymizing healthcare data: a case study on the blood transfusion service. In: SIGKDD (2009a)Google Scholar
- 39.Mohammed, N., Fung, B.C.M., Wang, K., Hung, P.C.K.: Privacy-preserving data mashup. In: EDBT (2009b)Google Scholar
- 40.Mohammed, N., Fung, B.C.M., Hung, P.C.K., Lee, C. (2010) Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans. Knowl. Discov. Data (TKDD) 4(4), 18:1–18:33Google Scholar
- 41.Nash J.: Non-cooperative games. Ann. Math. 54(2), 286–295 (1951)MathSciNetCrossRefGoogle Scholar
- 42.Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. http://archive.ics.uci.edu/ml/ (1998)
- 43.Nisan, N.: Algorithms for selfish agents. In: Proceedings of the STACS (1999)Google Scholar
- 44.Osborne M.J., Rubinstein A.: A Course in Game Theory. The MIT Press, Cambridge, UK (1994)zbMATHGoogle Scholar
- 45.Pinkas B.: Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor. Newsl. 4(2), 12–19 (2002)CrossRefGoogle Scholar
- 46.Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)Google Scholar
- 47.Samarati, P.: Protecting respondents’ identities in microdata release. IEEE TKDE 13(6), 1010–1027 (2001)Google Scholar
- 48.Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Proceedings of the DBSec (1998)Google Scholar
- 49.Sweeney L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002a)MathSciNetzbMATHCrossRefGoogle Scholar
- 50.Sweeney, L.: k-anonymity: a model for protecting privacy. In: International Journal on Uncertainty, Fuzziness and Knowledge-based Systems (2002b)Google Scholar
- 51.Thuraisingham, B.M.: Security checking in relational database management systems augmented with inference engines. Comput. Secur. 6(6), 479–492 (1987)Google Scholar
- 52.Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the ACM SIGKDD (2002)Google Scholar
- 53.Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ACM SIGKDD (2003)Google Scholar
- 54.Wang K., Fung B.C.M., Yu P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. KAIS 11(3), 345–368 (2007)Google Scholar
- 55.Wiederhold, G.: Intelligent integration of information. In: Proceedings of ACM SIGMOD, pp 434–437 (1993)Google Scholar
- 56.Wong, R.C.W., Li, J., Fu, A.W.C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD (2006)Google Scholar
- 57.Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: VLDB (2006)Google Scholar
- 58.Xiao, X., Yi, K., Tao, Y. The hardness and approximation algorithms for l-diversity. In: EDBT (2010)Google Scholar
- 59.Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the SDM (2005)Google Scholar
- 60.Yao, A.C.: Protocols for secure computations. In: Proceedings of the IEEE FOCS (1982)Google Scholar
- 61.Zhang, N., Zhao, W.: Distributed privacy preserving information sharing. In: Proceedings of the VLDB (2005)Google Scholar