Privacy-Preserving Decision Trees over Vertically Partitioned Data

  • Jaideep Vaidya
  • Chris Clifton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3654)


Privacy and security concerns can prevent sharing of data, derailing data mining projects.Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with the algorithm, we give a complete proof of security that gives a tight bound on the information revealed.


Leaf Node Information Gain Association Rule Mining Class Attribute Class Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, pp. 439–450. ACM, New York (2000)CrossRefGoogle Scholar
  2. 2.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
  4. 4.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15, 177–206 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Clifton, C., Estivill-Castro, V. (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan, vol. 14, pp. 1–8. Australian Computer Society (2002)Google Scholar
  6. 6.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, pp. 247–255. ACM, New York (2001)CrossRefGoogle Scholar
  7. 7.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 217–228 (2002)Google Scholar
  8. 8.
    Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB, Hong Kong, pp. 682–693 (2002)Google Scholar
  9. 9.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida (2003)Google Scholar
  10. 10.
    Kantarcıoǧlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16, 1026–1037 (2004)CrossRefGoogle Scholar
  11. 11.
    Rozenberg, B., Gudes, E.: Privacy preserving frequent item-set mining in vertically partitioned databases. In: Proceedings of the Seventeenth Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Estes Park, Colorado, U.S.A (2003)Google Scholar
  12. 12.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644 (2002)Google Scholar
  13. 13.
    Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. In: Journal of Computer Security (to appear)Google Scholar
  14. 14.
    Lin, X., Clifton, C., Zhu, M.: Privacy preserving clustering with distributed EM mixture modeling. In: Knowledge and Information Systems 2004( to appear )Google Scholar
  15. 15.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003)Google Scholar
  16. 16.
    Vaidya, J., Clifton, C.: Privacy preserving naïve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004)Google Scholar
  17. 17.
    Schneier, B.: Applied Cryptography, 2nd edn. John Wiley & Sons, Chichester (1995)zbMATHGoogle Scholar
  18. 18.
    Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 1–19. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  19. 19.
    Agrawal, R., Evfimievski, A., Srikant, R.: Information sharing across private databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, San Diego, California (2003)Google Scholar
  20. 20.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  21. 21.
    Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2005

Authors and Affiliations

  • Jaideep Vaidya
    • 1
  • Chris Clifton
    • 2
  1. 1.MSIS DepartmentRutgers UniversityNewarkUSA
  2. 2.Department of Computer SciencePurdue UniversityWest LafayetteUSA

Personalised recommendations