Integrating Private Databases for Data Analysis

  • Ke Wang
  • Benjamin C. M. Fung
  • Guozhu Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3495)


In today’s globally networked society, there is a dual demand on both information sharing and information protection. A typical scenario is that two parties wish to integrate their private databases to achieve a common goal beneficial to both, provided that their privacy requirements are satisfied. In this paper, we consider the goal of building a classifier over the integrated data while satisfying the k-anonymity privacy requirement. The k-anonymity requirement states that domain values are generalized so that each value of some specified attributes identifies at least k records. The generalization process must not leak more specific information other than the final integrated data. We present a practical and efficient solution to this problem.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The House of Commons in Canada: The personal information protection and electronic documents act (2000),
  2. 2.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California (2003)Google Scholar
  3. 3.
    Yao, A.C.: Protocols for secure computations. In: Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science (1982)Google Scholar
  4. 4.
    Liang, G., Chawathe, S.S.: Privacy-preserving inter-database operations. In: Proceedings of the 2nd Symposium on Intelligence and Security Informatics (2004)Google Scholar
  5. 5.
    Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2, 329–336 (1986)Google Scholar
  6. 6.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems 10, 571–588 (2002)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Hundepool, A., Willenborg, L.: μ- and τ-argus: Software for statistical disclosure control. In: Third International Seminar on Statistical Confidentiality, Bled (1996)Google Scholar
  8. 8.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE International Conference on Data Engineering, Tokyo, Japan (2005)Google Scholar
  9. 9.
    Wang, K., Yu, P., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proceedings of the 4th IEEE International Conference on Data Mining (2004)Google Scholar
  10. 10.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, pp. 279–288 (2002)Google Scholar
  11. 11.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ke Wang
    • 1
  • Benjamin C. M. Fung
    • 1
  • Guozhu Dong
    • 2
  1. 1.Simon Fraser UniversityCanada
  2. 2.Wright State UniversityUSA

Personalised recommendations