Abstract
In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.
Similar content being viewed by others
References
Aggarwal C C, Chen C, Han J W. On the inverse classification problem and its applications. In: Proceedings of 22nd International Conference on Data Engineering, Atlanta, 2006. 111
Dalvi N, Domingos P, Mausam, et al. Adversarial classification. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. 99–108
Peng H C, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1226–1238
Molina L C, Belanche L, Nebot A. Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2nd IEEE International Conference on Data Mining, Maebashi City, 2002. 306–313
Bi W, Shi Y, Lan Z. Transferred feature selection. In: Proceedings of 9th IEEE International Conference on Data Mining Workshops, Miami, 2009. 416–421
Agrawal R, Srikant R. Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, 2000. 439–450
Chen K, Liu L. Privacy preserving data classification with rotation perturbation. In: Proceedings of 5th IEEE International Conference on Data Mining, Houston, 2005. 589–592
Vaidya J, Kantarcioglu M, Clifton C. Privacy-perserving naive bayes classification. VLDB J, 2008, 174: 879–898
Cover T M, Hart P E. Nearest neighbor pattern classification. IEEE Trans Inform Theory, 1967, 13: 21–27
Hoare C A R. Quicksort. Comput J, 1962, 5: 10–15
Xing Z Z, Pei J, Yu P S. Early classification on time series. Knowl Inf Syst, 2012, 31: 105–127
Masud M M, Woolam C, Gao J, et al. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst, 2012, 33: 213–244
Jiang H, Ren Z L, Xuan J F, et al. Extracting elite pairwise constraints for clustering. Neurocomputing, 2013, 99: 124–133
Wang C D, Lai J H, Zhu J Y. Conscience online learning: an efficient approach for robust kernel-based clustering. Knowl Inf Syst, 2012, 31: 79–104
Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, 1996. 103–114
Zou W Q, Hu Y, Xuan J F, et al. Towards training set reduction for bug triage. In: Proceedings of 35th Annual IEEE International Computer Software and Applications Conference, Munich, 2011. 576–581
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, H., Xuan, J., Ren, Z. et al. Misleading classification. Sci. China Inf. Sci. 57, 1–17 (2014). https://doi.org/10.1007/s11432-013-4798-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4798-5