Compression for Anti-Adversarial Learning

  • Yan Zhou
  • Meador Inge
  • Murat Kantarcioglu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6635)


We investigate the susceptibility of compression-based learning algorithms to adversarial attacks. We demonstrate that compression-based algorithms are surprisingly resilient to carefully plotted attacks that can easily devastate standard learning algorithms. In the worst case where we assume the adversary has a full knowledge of training data, compression-based algorithms failed as expected. We tackle the worst case with a proposal of a new technique that analyzes subsequences strategically extracted from given data. We achieved near-zero performance loss in the worst case in the domain of spam filtering.


adversarial learning data compression subsequence differentiation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barreno, M., Nelson, B.A., Joseph, A.D., Tygar, D.: The security of machine learning. Technical Report UCB/EECS-2008-43, EECS Department, University of California, Berkeley (April 2008)Google Scholar
  2. 2.
    Sculley, D., Brodley, C.E.: Compression and machine learning: A new perspective on feature space vectors. In: DCC 2006: Proceedings of the Data Compression Conference, pp. 332–332. IEEE Computer Society, Washington, DC (2006)Google Scholar
  3. 3.
    Bratko, A., Filipič, B., Cormack, G.V., Lynam, T.R., Zupan, B.: Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673–2698 (2006)zbMATHGoogle Scholar
  4. 4.
    Zhou, Y., Inge, W.: Malware detection using adaptive data compression. In: AISec 2008: Proceedings of the 1st ACM Workshop on Artificial Intelligence and Security, Alexandria, Virginia, USA, pp. 53–60 (2008)Google Scholar
  5. 5.
    Jorgensen, Z., Zhou, Y., Inge, M.: A multiple instance learning strategy for combating good word attacks on spam filters. Journal of Machine Learning Research 9, 1115–1146 (2008)Google Scholar
  6. 6.
    Witten, I., Neal, R., Cleary, J.: Arithmetic coding for data compression. Communications of the ACM, 520–540 (June 1987)Google Scholar
  7. 7.
    Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications COM-32(4), 396–402 (1984)CrossRefGoogle Scholar
  8. 8.
    Cormack, G., Horspool, R.: Data compression using dynamic markov modeling. The Computer Journal 30(6), 541–550 (1987)CrossRefGoogle Scholar
  9. 9.
    Cleary, J., Witten, I.: Unbounded length contexts of ppm. The computer Journal 40(2/3), 67–75 (1997)CrossRefGoogle Scholar
  10. 10.
    Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer Academic Publishers, Boston (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Moffat, A.: Implementing the ppm data compression scheme. IEEE Trans. Comm. 38, 1917–1921 (1990)CrossRefGoogle Scholar
  12. 12.
    Howard, P.: The design and analysis of efficient lossless data compression systems. Technical report, Brown University (1993)Google Scholar
  13. 13.
    Teahan, W.J.: Text classification and segmentation using minimum cross-entropy. In: RIAO 2000, 6th International Conference Recherche d’Informaiton Assistee par ordinateur (2000)Google Scholar
  14. 14.
    Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: Proceedings of the 2nd Conference on Email and Anti-Spam (2005)Google Scholar
  15. 15.
    Cormack, G.V., Lynam, T.R.: Spam track guidelines – TREC 2005-2007 (2006),
  16. 16.
    Bratko, A.: Probabilistic sequence modeling shared library (2008),
  17. 17.
    Webb, S., Chitti, S., Pu, C.: An experimental evaluation of spam filter performance and robustness against attack. In: The 1st International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 19–21 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yan Zhou
    • 1
  • Meador Inge
    • 2
  • Murat Kantarcioglu
    • 1
  1. 1.Erik Jonnson School of Engineering and Computer ScienceUniversity of Texas at DallasRichardsonUSA
  2. 2.Mentor Graphics CorporationMobileUSA

Personalised recommendations