Skip to main content

Misleading Learners: Co-opting Your Spam Filter

  • Chapter
  • First Online:
Machine Learning in Cyber Trust

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. We show how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless—even if the adversary’s access is limited to only 1% of the spam training messages. We demonstrate three new attacks that successfully make the filter unusable, prevent victims from receiving specific email messages, and cause spam emails to arrive in the victim’s inbox.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS), pp 16-25

    Google Scholar 

  2. Barreno M, Nelson, Joseph AD, Tygar JD (2008) The security of machine learning. Tech. Rep. UCB/EECS-2008-43, EECS Department, University of California, Berkeley, URL http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-43.html

  3. Chung SP, Mok AK (2006) Allergy attack against automatic signature generation. In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID), pp 61-80

    Google Scholar 

  4. Chung SP, Mok AK (2007) Advanced allergy attacks: Does a corpus really help? In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID), pp 236-255

    Google Scholar 

  5. Cormack G, Lynam T (2005) Spam corpus creation for TREC. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)

    Google Scholar 

  6. Dalvi N, Domingos P, Mausam, Sanghai S, Verma D (2004) Adversarial classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 99-108

    Google Scholar 

  7. Fisher RA (1948) Question 14: Combining independent tests of significance. American Statistician 2(5):30-30J

    Article  Google Scholar 

  8. Graham P (2002) A plan for spam. http://www.paulgraham.com/spam.html

  9. Karlberger C, Bayler G, Kruegel C, Kirda E (2007) Exploiting redundancy in natural language to penetrate Bayesian spam filters. In: Proceedings of the USENIX Workshop on Offensive Technologies (WOOT), pp 1-7

    Google Scholar 

  10. Kearns M, Li M (1993) Learning in the presence of malicious errors. SIAM Journal on Computing 22(4):807-837

    Article  MATH  MathSciNet  Google Scholar 

  11. Kim HA, Karp B (2004) Autograph: Toward automated, distributed worm signature detection. In: Proceedings of the USENIX Security Symposium, pp 271-286

    Google Scholar 

  12. Klimt B, Yang Y (2004) Introducing the Enron corpus. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)

    Google Scholar 

  13. Lazarevic A, Ertöz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Barbará D, Kamath C (eds) Proceedings of the SIAM International Conference on Data Mining, pp 25-36

    Google Scholar 

  14. Liao Y, Vemuri VR (2002) Using text categorization techniques for intrusion detection. In: Proceedings of the USENIX Security Symposium, pp 51-59

    Google Scholar 

  15. Lowd D, Meek C (2005) Adversarial learning. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 641-647

    Google Scholar 

  16. Lowd D, Meek C (2005) Good word attacks on statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)

    Google Scholar 

  17. Meyer T, Whateley B (2004) SpamBayes: Effective open-source, Bayesian based, email classification system. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)

    Google Scholar 

  18. Mukkamala S, Janoski G, Sung A (2002) Intrusion detection using neural networks and support vector machines. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp 1702-1707

    Google Scholar 

  19. Nelson B, Barreno M, Chi FJ, Joseph AD, Rubinstein BIP, Saini U, Sutton C, Tygar JD, Xia K (2008) Exploiting machine learning to subvert your spam filter. In: Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET)

    Google Scholar 

  20. Newsome J, Karp B, Song D (2005) Polygraph: Automatically generating signatures for polymorphic worms. In: Proceedings of the IEEE Symposium on Security and Privacy, pp 226-241

    Google Scholar 

  21. Newsome J, Karp B, Song D (2006) Paragraph: Thwarting signature learning by training maliciously. In: Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID 2006), pp 81-105

    Google Scholar 

  22. Robinson G (2003) A statistical approach to the spam problem. Linux Journal

    Google Scholar 

  23. Shaoul C, Westbury C (2007) A USENET corpus (2005-2007)

    Google Scholar 

  24. Stolfo SJ, Li WJ, Hershkop S, Wang K, Hu CW, Nimeskern O (2004) Detecting viral propagations using email behavior profiles. ACM Transactions on Internet Technology (TOIT) pp 187-221

    Google Scholar 

  25. Wittel GL, Wu SF (2004) On attacking statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag US

About this chapter

Cite this chapter

Nelson, B. et al. (2009). Misleading Learners: Co-opting Your Spam Filter. In: Machine Learning in Cyber Trust. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-88735-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-88735-7_2

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-88734-0

  • Online ISBN: 978-0-387-88735-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics