Spam Email Filtering Using Network-Level Properties
Spam is serious problem that affects email users (e.g. phishing attacks, viruses and time spent reading unwanted messages). We propose a novel spam email filtering approach based on network-level attributes (e.g. the IP sender geographic coordinates) that are more persistent in time when compared to message content. This approach was tested using two classifiers, Naive Bayes (NB) and Support Vector Machines (SVM), and compared against bag-of-words models and eight blacklists. Several experiments were held with recent collected legitimate (ham) and non legitimate (spam) messages, in order to simulate distinct user profiles from two countries (USA and Portugal). Overall, the network-level based SVM model achieved the best discriminatory performance. Moreover, preliminary results suggests that such method is more robust to phishing attacks.
KeywordsAnti-Spam filtering Text Mining Naive Bayes Support Vector Machines
Unable to display preview. Download preview PDF.
- 1.Beverly, R., Sollins, K.: Exploiting transport-level characteristics of spam. In: 5th Conference on Email and Anti-Spam, CEAS (2008)Google Scholar
- 7.Cortez, P., Lopes, C., Sousa, P., Rocha, M., Rio, M.: Symbiotic Data Mining for Personalized Spam Filtering. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI 2009), pp. 149–156. IEEE, Los Alamitos (2009)Google Scholar
- 10.Flexer, A.: Statistical Evaluation of Neural Networks Experiments: Minimum Requirements and Current Practice. In: Proceedings of the 13th European Meeting on Cybernetics and Systems Research, Vienna, Austria, vol. 2, pp. 1005–1008 (1996)Google Scholar
- 11.Leiba, B., Ossher, J., Rajan, V.T., Segal, R., Wegman, M.: SMTP path analysis. In: Proceedings of the Second Conference on E-mail and Anti-Spam, CEAS (2005)Google Scholar
- 13.MAAWG. Email Metrics Program: The Network Operators’ Perspective. Report #10 – third and fourth quarter 2008, Messaging Anti-Abuse Working Group, S. Francisco, CA, USA (March 2009)Google Scholar
- 14.Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes? In: Third Conference on Email and Anti-Spam, CEAS (2006)Google Scholar
- 15.Nelson, B., Barreno, M., Chi, F., Joseph, A., Rubinstein, B., Saini, U., Sutton, C., Tygar, J., Xia, K.: Exploiting Machine Learning to Subvert Your Spam Filter. In: 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pp. 1–9. ACM Press, New York (2008)Google Scholar
- 16.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2009), ISBN 3-900051-00-3 http://www.R-project.org
- 17.Ramachandran, A., Feamster, N.: Understanding the Network-Level Behavior of Spammers. In: ACM (ed.) SIGCOMM 2006, pp. 291–302 (2006)Google Scholar