k-Anonymous Decision Tree Induction

  • Arik Friedman
  • Assaf Schuster
  • Ran Wolff
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)

Abstract

In this paper we explore an approach to privacy preserving data mining that relies on the k-anonymity model. The k-anonymity model guarantees that no private information in a table can be linked to a group of less than k individuals. We suggest extended definitions of k-anonymity that allow the k-anonymity of a data mining model to be determined. Using these definitions, we present decision tree induction algorithms that are guaranteed to maintain k-anonymity of the learning examples. Experiments show that embedding anonymization within the decision tree induction process provides better accuracy than anonymizing the data first and inducing the tree later.

Keywords

k-anonymity privacy preserving data mining decision trees 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM Press, New York (2000)CrossRefGoogle Scholar
  2. 2.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proc. of CRPITS’14, pp. 1–8. Australian Computer Society, Inc., Darlinghurst (2002)Google Scholar
  3. 3.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: DBSec, pp. 139–152 (2005)Google Scholar
  5. 5.
    Kantarcioǧlu, M., Jin, J., Clifton, C.: When do data mining results violate privacy? In: Proc. of ACM SIGKDD, NY, USA, pp. 599–604. ACM Press, New York (2004)Google Scholar
  6. 6.
    US Dept. of HHS: Standards for privacy of individually identifiable health information; final rule (2002)Google Scholar
  7. 7.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proc. of PODS 2004, pp. 223–228. ACM Press, New York (2004)CrossRefGoogle Scholar
  8. 8.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT) (2005)Google Scholar
  9. 9.
    Bayardo Jr., R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proc. of ICDE, pp. 217–228 (2005)Google Scholar
  10. 10.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of ICDE (2005)Google Scholar
  11. 11.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proc. of ACM SIGKDD, pp. 279–288 (2002)Google Scholar
  12. 12.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proc. of ICDE (2006)Google Scholar
  13. 13.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 571–588 (2002)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 10–21. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: Proc. of SIGMOD, NY, USA, pp. 49–60. ACM Press, New York (2005)Google Scholar
  16. 16.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  17. 17.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  19. 19.
    Newman, D.J., Hettich, S., Merz, C.B.: UCI repository of machine learning databases (1998)Google Scholar
  20. 20.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: Proc. of ICDE (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Arik Friedman
    • 1
  • Assaf Schuster
    • 1
  • Ran Wolff
    • 1
  1. 1.Computer Science Dept.Technion – Israel Institute of Technology 

Personalised recommendations