Integrating Private Databases for Data Analysis

  • Ke Wang
  • Benjamin C. M. Fung
  • Guozhu Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3495)

Abstract

In today’s globally networked society, there is a dual demand on both information sharing and information protection. A typical scenario is that two parties wish to integrate their private databases to achieve a common goal beneficial to both, provided that their privacy requirements are satisfied. In this paper, we consider the goal of building a classifier over the integrated data while satisfying the k-anonymity privacy requirement. The k-anonymity requirement states that domain values are generalized so that each value of some specified attributes identifies at least k records. The generalization process must not leak more specific information other than the final integrated data. We present a practical and efficient solution to this problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The House of Commons in Canada: The personal information protection and electronic documents act (2000), http://www.privcom.gc.ca/
  2. 2.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California (2003)Google Scholar
  3. 3.
    Yao, A.C.: Protocols for secure computations. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science (1982)Google Scholar
  4. 4.
    Liang, G., Chawathe, S.S.: Privacy-preserving inter-database operations. In: Proceedings of the 2nd Symposium on Intelligence and Security Informatics (2004)Google Scholar
  5. 5.
    Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2, 329–336 (1986)Google Scholar
  6. 6.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems 10, 571–588 (2002)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Hundepool, A., Willenborg, L.: μ- and τ-argus: Software for statistical disclosure control. In: Third International Seminar on Statistical Confidentiality, Bled (1996)Google Scholar
  8. 8.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE International Conference on Data Engineering, Tokyo, Japan (2005)Google Scholar
  9. 9.
    Wang, K., Yu, P., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 4th IEEE International Conference on Data Mining (2004)Google Scholar
  10. 10.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, pp. 279–288 (2002)Google Scholar
  11. 11.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ke Wang
    • 1
  • Benjamin C. M. Fung
    • 1
  • Guozhu Dong
    • 2
  1. 1.Simon Fraser UniversityCanada
  2. 2.Wright State UniversityUSA

Personalised recommendations