Abstract
Many data mining applications, such as spam filtering and intrusion detection, are faced with active adversaries. In all these applications, the future data sets and the training data set are no longer from the same population, due to the transformations employed by the adversaries. Hence a main assumption for the existing classification techniques no longer holds and initially successful classifiers degrade easily. This becomes a game between the adversary and the data miner: The adversary modifies its strategy to avoid being detected by the current classifier; the data miner then updates its classifier based on the new threats. In this paper, we investigate the possibility of an equilibrium in this seemingly never ending game, where neither party has an incentive to change. Modifying the classifier causes too many false positives with too little increase in true positives; changes by the adversary decrease the utility of the false negative items that are not detected. We develop a game theoretic framework where equilibrium behavior of adversarial classification applications can be analyzed, and provide solutions for finding an equilibrium point. A classifier’s equilibrium performance indicates its eventual success or failure. The data miner could then select attributes based on their equilibrium performance, and construct an effective classifier. A case study on online lending data demonstrates how to apply the proposed game theoretic framework to a real application.
Article PDF
Similar content being viewed by others
References
Androutsopoulos I, Magirou EF, Vassilakis DK (2005) A game theoretic model of spam e-mailing. In: Proceedings of the 2nd conference on email and anti-spam, Palo Alto, CA, July 21–July 22, pp 1–8
Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge, United Kingdom
Dalvi N, Domingos P, Mausam, Sanghai S, Verma D (2004) Adversarial classification. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, August 22–August 25, pp 99–108
Duda R, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Malden, MA
El Ghaoui L, Lanckriet GRG, Natsoulis G (2003) Robust classification with interval data. Tech. Rep. UCB/CSD-03-1279, EECS Department, University of California, Berkeley
Fawcett T, Provost FJ (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3): 291–316
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, San Diego, CA
Globerson A, Roweis S (2006) Nightmare at test time: robust learning by feature deletion. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, PA, June 25–June 29, pp 353–360
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, August 26–August 29, pp 97–106
Lanckriet GR, Ghaoui LE, Bhattacharyya C, Jordan MI (2003) A robust minimax approach to classification. J Mach Learn Res 3: 555–582
Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, Zissman MA (2000) Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. In: Proceedings of the 2000 DARPA information survivability conference and exposition, Hilton Head, South Carolina, January 25–January 27, pp 12–26
Lowd D, Meek C (2005a) Adversarial learning. In: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, Chicago, IL, August 21–August 24, pp 641–647
Lowd D, Meek C (2005b) Good word attacks on statistical spam filters. In: Proceedings of the 2nd conference on email and anti-spam, Palo Alto, CA, July 21–July 22, pp 1–8
Mahoney MV, Chan PK (2002) Learning nonstationary models of normal network traffic for detecting novel attacks. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, July 23–July 26, pp 376–385
McKelvey RD, McLennan AM, Turocy TL (2007) Gambit: software tools for game theory, version 0.2007.01.30. http://econweb.tamu.edu/gambit
Mitra D, Romeo F, Sangiovanni-Vincentelli A (1986) Convergence and finite-time behavior of simulated annealing. Adv Appl Probab 18(3): 747–771
Osborne MJ, Rubinstein A (1999) A course in game theory, 1st edn. MIT Press, Cambridge, MA
Pu C, Webb S (2006) Observed trends in spam construction techniques: A case study of spam evolution. In: Proceedings of the 3rd conference on email and anti-spam, Mountain View, California, July 27–July 28, pp 1–9
Robert CP, Casella G (2004) Monte carlo statistical methods, 2nd edn. Springer, New York, NY
Stinson E, Mitchell JC (2008) Towards systematic evaluation of the evadability of bot/botnet detection methods. In: Proceedings of the 2nd conference on USENIX workshop on offensive technologies, San Jose, CA, July 28–August 1, pp 1–9
Teo CH, Globerson A, Roweis S, Smola A (2008) Convex learning with invariances. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems, 20. MIT Press, Cambridge, MA, pp 1489–1496
Vallee T, Basar T (1999) Off-line computation of stackelberg solutions with the genetic algorithm. J Comput Econ 13(3): 201–209
Acknowledgements
We thank the reviewers and the editors for their helpful comments that improved the presentation and the content of the article. This work was partially supported by Air Force Office of Scientific Research MURI Grant FA9550 08-1-0265, National Institutes of Health Grant 1R01LM009989, National Science Foundation Grants Career-0845803, DMS-0904548 and CNS-0964350.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Kantarcıoğlu, M., Xi, B. & Clifton, C. Classifier evaluation and attribute selection against active adversaries. Data Min Knowl Disc 22, 291–335 (2011). https://doi.org/10.1007/s10618-010-0197-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0197-3