Classifier evaluation and attribute selection against active adversaries

Kantarcıoğlu, Murat; Xi, Bowei; Clifton, Chris

doi:10.1007/s10618-010-0197-3

Classifier evaluation and attribute selection against active adversaries

Open access
Published: 12 August 2010

Volume 22, pages 291–335, (2011)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Classifier evaluation and attribute selection against active adversaries

Download PDF

Murat Kantarcıoğlu¹,
Bowei Xi² &
Chris Clifton³

1604 Accesses
49 Citations
3 Altmetric
Explore all metrics

Abstract

Many data mining applications, such as spam filtering and intrusion detection, are faced with active adversaries. In all these applications, the future data sets and the training data set are no longer from the same population, due to the transformations employed by the adversaries. Hence a main assumption for the existing classification techniques no longer holds and initially successful classifiers degrade easily. This becomes a game between the adversary and the data miner: The adversary modifies its strategy to avoid being detected by the current classifier; the data miner then updates its classifier based on the new threats. In this paper, we investigate the possibility of an equilibrium in this seemingly never ending game, where neither party has an incentive to change. Modifying the classifier causes too many false positives with too little increase in true positives; changes by the adversary decrease the utility of the false negative items that are not detected. We develop a game theoretic framework where equilibrium behavior of adversarial classification applications can be analyzed, and provide solutions for finding an equilibrium point. A classifier’s equilibrium performance indicates its eventual success or failure. The data miner could then select attributes based on their equilibrium performance, and construct an effective classifier. A case study on online lending data demonstrates how to apply the proposed game theoretic framework to a real application.

Article PDF

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Androutsopoulos I, Magirou EF, Vassilakis DK (2005) A game theoretic model of spam e-mailing. In: Proceedings of the 2nd conference on email and anti-spam, Palo Alto, CA, July 21–July 22, pp 1–8
Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA
MATH Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge, United Kingdom
Book MATH Google Scholar
Dalvi N, Domingos P, Mausam, Sanghai S, Verma D (2004) Adversarial classification. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, August 22–August 25, pp 99–108
Duda R, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Malden, MA
MATH Google Scholar
El Ghaoui L, Lanckriet GRG, Natsoulis G (2003) Robust classification with interval data. Tech. Rep. UCB/CSD-03-1279, EECS Department, University of California, Berkeley
Fawcett T, Provost FJ (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3): 291–316
Article Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, San Diego, CA
MATH Google Scholar
Globerson A, Roweis S (2006) Nightmare at test time: robust learning by feature deletion. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, PA, June 25–June 29, pp 353–360
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, August 26–August 29, pp 97–106
Lanckriet GR, Ghaoui LE, Bhattacharyya C, Jordan MI (2003) A robust minimax approach to classification. J Mach Learn Res 3: 555–582
Article MATH MathSciNet Google Scholar
Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, Zissman MA (2000) Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. In: Proceedings of the 2000 DARPA information survivability conference and exposition, Hilton Head, South Carolina, January 25–January 27, pp 12–26
Lowd D, Meek C (2005a) Adversarial learning. In: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, Chicago, IL, August 21–August 24, pp 641–647
Lowd D, Meek C (2005b) Good word attacks on statistical spam filters. In: Proceedings of the 2nd conference on email and anti-spam, Palo Alto, CA, July 21–July 22, pp 1–8
Mahoney MV, Chan PK (2002) Learning nonstationary models of normal network traffic for detecting novel attacks. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, July 23–July 26, pp 376–385
McKelvey RD, McLennan AM, Turocy TL (2007) Gambit: software tools for game theory, version 0.2007.01.30. http://econweb.tamu.edu/gambit
Mitra D, Romeo F, Sangiovanni-Vincentelli A (1986) Convergence and finite-time behavior of simulated annealing. Adv Appl Probab 18(3): 747–771
Article MATH MathSciNet Google Scholar
Osborne MJ, Rubinstein A (1999) A course in game theory, 1st edn. MIT Press, Cambridge, MA
Google Scholar
Pu C, Webb S (2006) Observed trends in spam construction techniques: A case study of spam evolution. In: Proceedings of the 3rd conference on email and anti-spam, Mountain View, California, July 27–July 28, pp 1–9
Robert CP, Casella G (2004) Monte carlo statistical methods, 2nd edn. Springer, New York, NY
MATH Google Scholar
Stinson E, Mitchell JC (2008) Towards systematic evaluation of the evadability of bot/botnet detection methods. In: Proceedings of the 2nd conference on USENIX workshop on offensive technologies, San Jose, CA, July 28–August 1, pp 1–9
Teo CH, Globerson A, Roweis S, Smola A (2008) Convex learning with invariances. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems, 20. MIT Press, Cambridge, MA, pp 1489–1496
Google Scholar
Vallee T, Basar T (1999) Off-line computation of stackelberg solutions with the genetic algorithm. J Comput Econ 13(3): 201–209
Article MATH Google Scholar

Download references

Acknowledgements

We thank the reviewers and the editors for their helpful comments that improved the presentation and the content of the article. This work was partially supported by Air Force Office of Scientific Research MURI Grant FA9550 08-1-0265, National Institutes of Health Grant 1R01LM009989, National Science Foundation Grants Career-0845803, DMS-0904548 and CNS-0964350.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Computer Science Department, University of Texas at Dallas, Richardson, TX, USA
Murat Kantarcıoğlu
Department of Statistics, Purdue University, West Lafayette, IN, USA
Bowei Xi
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Chris Clifton

Authors

Murat Kantarcıoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Bowei Xi
View author publications
You can also search for this author in PubMed Google Scholar
Chris Clifton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Murat Kantarcıoğlu.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Kantarcıoğlu, M., Xi, B. & Clifton, C. Classifier evaluation and attribute selection against active adversaries. Data Min Knowl Disc 22, 291–335 (2011). https://doi.org/10.1007/s10618-010-0197-3

Download citation

Received: 16 June 2008
Accepted: 17 July 2010
Published: 12 August 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s10618-010-0197-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Classifier evaluation and attribute selection against active adversaries

Abstract

Article PDF

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifier evaluation and attribute selection against active adversaries

Abstract

Article PDF

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation