Misleading classification

Jiang, He; Xuan, JiFeng; Ren, ZhiLei; Wu, YouXi; Wu, XinDong

doi:10.1007/s11432-013-4798-5

Misleading classification

Research Paper
Published: 25 January 2014

Volume 57, pages 1–17, (2014)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

He Jiang¹,
JiFeng Xuan¹,
ZhiLei Ren¹,
YouXi Wu² &
…
XinDong Wu³

159 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we investigate a new problem-misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal C C, Chen C, Han J W. On the inverse classification problem and its applications. In: Proceedings of 22nd International Conference on Data Engineering, Atlanta, 2006. 111
Google Scholar
Dalvi N, Domingos P, Mausam, et al. Adversarial classification. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, 2004. 99–108
Google Scholar
Peng H C, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1226–1238
Article Google Scholar
Molina L C, Belanche L, Nebot A. Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2nd IEEE International Conference on Data Mining, Maebashi City, 2002. 306–313
Google Scholar
Bi W, Shi Y, Lan Z. Transferred feature selection. In: Proceedings of 9th IEEE International Conference on Data Mining Workshops, Miami, 2009. 416–421
Google Scholar
Agrawal R, Srikant R. Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, 2000. 439–450
Chapter Google Scholar
Chen K, Liu L. Privacy preserving data classification with rotation perturbation. In: Proceedings of 5th IEEE International Conference on Data Mining, Houston, 2005. 589–592
Google Scholar
Vaidya J, Kantarcioglu M, Clifton C. Privacy-perserving naive bayes classification. VLDB J, 2008, 174: 879–898
Article Google Scholar
Cover T M, Hart P E. Nearest neighbor pattern classification. IEEE Trans Inform Theory, 1967, 13: 21–27
Article MATH Google Scholar
Hoare C A R. Quicksort. Comput J, 1962, 5: 10–15
Article MATH MathSciNet Google Scholar
Xing Z Z, Pei J, Yu P S. Early classification on time series. Knowl Inf Syst, 2012, 31: 105–127
Article Google Scholar
Masud M M, Woolam C, Gao J, et al. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst, 2012, 33: 213–244
Article Google Scholar
Jiang H, Ren Z L, Xuan J F, et al. Extracting elite pairwise constraints for clustering. Neurocomputing, 2013, 99: 124–133
Article Google Scholar
Wang C D, Lai J H, Zhu J Y. Conscience online learning: an efficient approach for robust kernel-based clustering. Knowl Inf Syst, 2012, 31: 79–104
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, 1996. 103–114
Google Scholar
Zou W Q, Hu Y, Xuan J F, et al. Towards training set reduction for bug triage. In: Proceedings of 35th Annual IEEE International Computer Software and Applications Conference, Munich, 2011. 576–581
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, 116621, China
He Jiang, JiFeng Xuan & ZhiLei Ren
School of Computer Science and Software, Hebei University of Technology, Tianjin, 300130, China
YouXi Wu
Computer Science Department, University of Vermont, Burlington, Vermont, 05403, USA
XinDong Wu

Authors

He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
JiFeng Xuan
View author publications
You can also search for this author in PubMed Google Scholar
ZhiLei Ren
View author publications
You can also search for this author in PubMed Google Scholar
YouXi Wu
View author publications
You can also search for this author in PubMed Google Scholar
XinDong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Xuan, J., Ren, Z. et al. Misleading classification. Sci. China Inf. Sci. 57, 1–17 (2014). https://doi.org/10.1007/s11432-013-4798-5

Download citation

Received: 16 August 2013
Accepted: 14 November 2013
Published: 25 January 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11432-013-4798-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Misleading classification

Abstract

Access this article

Similar content being viewed by others

On Selection Bias with Imbalanced Classes

PLVI-CE: a multi-label active learning algorithm with simultaneously considering uncertainty and diversity

C4.5 or Naive Bayes: A Discriminative Model Selection Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Misleading classification

Abstract

Access this article

Similar content being viewed by others

On Selection Bias with Imbalanced Classes

PLVI-CE: a multi-label active learning algorithm with simultaneously considering uncertainty and diversity

C4.5 or Naive Bayes: A Discriminative Model Selection Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation