Skip to main content
Log in

Misleading classification

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal C C, Chen C, Han J W. On the inverse classification problem and its applications. In: Proceedings of 22nd International Conference on Data Engineering, Atlanta, 2006. 111

    Google Scholar 

  2. Dalvi N, Domingos P, Mausam, et al. Adversarial classification. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. 99–108

    Google Scholar 

  3. Peng H C, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1226–1238

    Article  Google Scholar 

  4. Molina L C, Belanche L, Nebot A. Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2nd IEEE International Conference on Data Mining, Maebashi City, 2002. 306–313

    Google Scholar 

  5. Bi W, Shi Y, Lan Z. Transferred feature selection. In: Proceedings of 9th IEEE International Conference on Data Mining Workshops, Miami, 2009. 416–421

    Google Scholar 

  6. Agrawal R, Srikant R. Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, 2000. 439–450

    Chapter  Google Scholar 

  7. Chen K, Liu L. Privacy preserving data classification with rotation perturbation. In: Proceedings of 5th IEEE International Conference on Data Mining, Houston, 2005. 589–592

    Google Scholar 

  8. Vaidya J, Kantarcioglu M, Clifton C. Privacy-perserving naive bayes classification. VLDB J, 2008, 174: 879–898

    Article  Google Scholar 

  9. Cover T M, Hart P E. Nearest neighbor pattern classification. IEEE Trans Inform Theory, 1967, 13: 21–27

    Article  MATH  Google Scholar 

  10. Hoare C A R. Quicksort. Comput J, 1962, 5: 10–15

    Article  MATH  MathSciNet  Google Scholar 

  11. Xing Z Z, Pei J, Yu P S. Early classification on time series. Knowl Inf Syst, 2012, 31: 105–127

    Article  Google Scholar 

  12. Masud M M, Woolam C, Gao J, et al. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst, 2012, 33: 213–244

    Article  Google Scholar 

  13. Jiang H, Ren Z L, Xuan J F, et al. Extracting elite pairwise constraints for clustering. Neurocomputing, 2013, 99: 124–133

    Article  Google Scholar 

  14. Wang C D, Lai J H, Zhu J Y. Conscience online learning: an efficient approach for robust kernel-based clustering. Knowl Inf Syst, 2012, 31: 79–104

    Article  Google Scholar 

  15. Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, 1996. 103–114

    Google Scholar 

  16. Zou W Q, Hu Y, Xuan J F, et al. Towards training set reduction for bug triage. In: Proceedings of 35th Annual IEEE International Computer Software and Applications Conference, Munich, 2011. 576–581

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Xuan, J., Ren, Z. et al. Misleading classification. Sci. China Inf. Sci. 57, 1–17 (2014). https://doi.org/10.1007/s11432-013-4798-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-013-4798-5

Keywords

Navigation