Compression for Anti-Adversarial Learning
We investigate the susceptibility of compression-based learning algorithms to adversarial attacks. We demonstrate that compression-based algorithms are surprisingly resilient to carefully plotted attacks that can easily devastate standard learning algorithms. In the worst case where we assume the adversary has a full knowledge of training data, compression-based algorithms failed as expected. We tackle the worst case with a proposal of a new technique that analyzes subsequences strategically extracted from given data. We achieved near-zero performance loss in the worst case in the domain of spam filtering.
Keywordsadversarial learning data compression subsequence differentiation
Unable to display preview. Download preview PDF.
- 1.Barreno, M., Nelson, B.A., Joseph, A.D., Tygar, D.: The security of machine learning. Technical Report UCB/EECS-2008-43, EECS Department, University of California, Berkeley (April 2008)Google Scholar
- 2.Sculley, D., Brodley, C.E.: Compression and machine learning: A new perspective on feature space vectors. In: DCC 2006: Proceedings of the Data Compression Conference, pp. 332–332. IEEE Computer Society, Washington, DC (2006)Google Scholar
- 4.Zhou, Y., Inge, W.: Malware detection using adaptive data compression. In: AISec 2008: Proceedings of the 1st ACM Workshop on Artificial Intelligence and Security, Alexandria, Virginia, USA, pp. 53–60 (2008)Google Scholar
- 5.Jorgensen, Z., Zhou, Y., Inge, M.: A multiple instance learning strategy for combating good word attacks on spam filters. Journal of Machine Learning Research 9, 1115–1146 (2008)Google Scholar
- 6.Witten, I., Neal, R., Cleary, J.: Arithmetic coding for data compression. Communications of the ACM, 520–540 (June 1987)Google Scholar
- 12.Howard, P.: The design and analysis of efficient lossless data compression systems. Technical report, Brown University (1993)Google Scholar
- 13.Teahan, W.J.: Text classification and segmentation using minimum cross-entropy. In: RIAO 2000, 6th International Conference Recherche d’Informaiton Assistee par ordinateur (2000)Google Scholar
- 14.Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: Proceedings of the 2nd Conference on Email and Anti-Spam (2005)Google Scholar
- 15.Cormack, G.V., Lynam, T.R.: Spam track guidelines – TREC 2005-2007 (2006), http://plg.uwaterloo.ca/~gvcormac/treccorpus06/
- 16.Bratko, A.: Probabilistic sequence modeling shared library (2008), http://ai.ijs.si/andrej/psmslib.html
- 17.Webb, S., Chitti, S., Pu, C.: An experimental evaluation of spam filter performance and robustness against attack. In: The 1st International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 19–21 (2005)Google Scholar