A Privacy-Preserving Classification Mining Algorithm
Privacy-preserving classification mining is one of the fast-growing sub-areas of data mining. How to perturb original data and then build a decision tree based on perturbed data is the key research challenge. By applying transition probability matrix this paper proposes a novel privacy-preserving classification mining algorithm which suits all data types, arbitrary probability distribution of original data, and perturbing all attributes (including label attribute). Experimental results demonstrate that decision tree built using this algorithm on perturbed data has comparable classifying accuracy to decision tree built using un-privacy-preserving algorithm on original data.
KeywordsTransition Probability Matrix Split Point Support Count Average Classification Accuracy Split Attribute
Unable to display preview. Download preview PDF.
- 1.Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, Dallas, Texas, May 2000, pp. 439–450 (2000)Google Scholar
- 3.Agrawal, D., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara, California, USA (May 2001)Google Scholar
- 4.Du, W.L., Zhan, Z.J.: Using Randomized Response Techniques for Privacy-Preserving Data Mining. In: Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery in Databases and Data Mining, Washington, DC, USA, August 24–27 (2003)Google Scholar
- 5.Agrawal, R., Ghost, S., Imielinski, T., Iyer, B., Swami, A.: An interval Classifier for database mining applications. In: Proc. of the VLDB Conference, Vancouver, British Columbia, Canada, August 1992, pp. 560–573 (1992)Google Scholar