Science China Information Sciences

, Volume 57, Issue 5, pp 1–17 | Cite as

Misleading classification

  • He Jiang
  • JiFeng Xuan
  • ZhiLei Ren
  • YouXi Wu
  • XinDong Wu
Research Paper

Abstract

In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.

Keywords

misleading classification naive Bayes K-nearest neighbor 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal C C, Chen C, Han J W. On the inverse classification problem and its applications. In: Proceedings of 22nd International Conference on Data Engineering, Atlanta, 2006. 111Google Scholar
  2. 2.
    Dalvi N, Domingos P, Mausam, et al. Adversarial classification. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. 99–108Google Scholar
  3. 3.
    Peng H C, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1226–1238CrossRefGoogle Scholar
  4. 4.
    Molina L C, Belanche L, Nebot A. Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2nd IEEE International Conference on Data Mining, Maebashi City, 2002. 306–313Google Scholar
  5. 5.
    Bi W, Shi Y, Lan Z. Transferred feature selection. In: Proceedings of 9th IEEE International Conference on Data Mining Workshops, Miami, 2009. 416–421Google Scholar
  6. 6.
    Agrawal R, Srikant R. Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, 2000. 439–450CrossRefGoogle Scholar
  7. 7.
    Chen K, Liu L. Privacy preserving data classification with rotation perturbation. In: Proceedings of 5th IEEE International Conference on Data Mining, Houston, 2005. 589–592Google Scholar
  8. 8.
    Vaidya J, Kantarcioglu M, Clifton C. Privacy-perserving naive bayes classification. VLDB J, 2008, 174: 879–898CrossRefGoogle Scholar
  9. 9.
    Cover T M, Hart P E. Nearest neighbor pattern classification. IEEE Trans Inform Theory, 1967, 13: 21–27CrossRefMATHGoogle Scholar
  10. 10.
    Hoare C A R. Quicksort. Comput J, 1962, 5: 10–15CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Xing Z Z, Pei J, Yu P S. Early classification on time series. Knowl Inf Syst, 2012, 31: 105–127CrossRefGoogle Scholar
  12. 12.
    Masud M M, Woolam C, Gao J, et al. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst, 2012, 33: 213–244CrossRefGoogle Scholar
  13. 13.
    Jiang H, Ren Z L, Xuan J F, et al. Extracting elite pairwise constraints for clustering. Neurocomputing, 2013, 99: 124–133CrossRefGoogle Scholar
  14. 14.
    Wang C D, Lai J H, Zhu J Y. Conscience online learning: an efficient approach for robust kernel-based clustering. Knowl Inf Syst, 2012, 31: 79–104CrossRefGoogle Scholar
  15. 15.
    Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, 1996. 103–114Google Scholar
  16. 16.
    Zou W Q, Hu Y, Xuan J F, et al. Towards training set reduction for bug triage. In: Proceedings of 35th Annual IEEE International Computer Software and Applications Conference, Munich, 2011. 576–581Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • He Jiang
    • 1
  • JiFeng Xuan
    • 1
  • ZhiLei Ren
    • 1
  • YouXi Wu
    • 2
  • XinDong Wu
    • 3
  1. 1.School of SoftwareDalian University of TechnologyDalianChina
  2. 2.School of Computer Science and SoftwareHebei University of TechnologyTianjinChina
  3. 3.Computer Science DepartmentUniversity of VermontBurlingtonUSA

Personalised recommendations