In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.
Keywordsmisleading classification naive Bayes K-nearest neighbor
Unable to display preview. Download preview PDF.
- 1.Aggarwal C C, Chen C, Han J W. On the inverse classification problem and its applications. In: Proceedings of 22nd International Conference on Data Engineering, Atlanta, 2006. 111Google Scholar
- 2.Dalvi N, Domingos P, Mausam, et al. Adversarial classification. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. 99–108Google Scholar
- 4.Molina L C, Belanche L, Nebot A. Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2nd IEEE International Conference on Data Mining, Maebashi City, 2002. 306–313Google Scholar
- 5.Bi W, Shi Y, Lan Z. Transferred feature selection. In: Proceedings of 9th IEEE International Conference on Data Mining Workshops, Miami, 2009. 416–421Google Scholar
- 7.Chen K, Liu L. Privacy preserving data classification with rotation perturbation. In: Proceedings of 5th IEEE International Conference on Data Mining, Houston, 2005. 589–592Google Scholar
- 15.Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, 1996. 103–114Google Scholar
- 16.Zou W Q, Hu Y, Xuan J F, et al. Towards training set reduction for bug triage. In: Proceedings of 35th Annual IEEE International Computer Software and Applications Conference, Munich, 2011. 576–581Google Scholar