SVM-Based Feature Selection and Classification for Email Filtering

  • Sebastían Maldonado
  • Gaston L’Huillier
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 204)


The email inbox is indeed a dangerous place, but using pattern recognition tools it is possible to filter most wasteful elements that may cause damage to end users. Furthermore, as phishing and spam strategies have shown an adversarial and dynamic behavior, the number of variables to be considered for a proper email classification has increased substantially over time. For many years these elements have driven pattern recognition and machine learning communities to keep improving email filtering techniques. This work presents an embedded feature selection approach that determines a non-linear decision boundary with minimal error and a reduced number of features by penalizing their use in the dual formulation of binary Support Vector Machines (SVMs). The proposed method optimizes the width of an anisotropic RBF Kernel via successive gradient descent steps, eliminating those features that have low relevance for the model. Experiments with two real-world spam and phishing data sets demonstrate that our approach has a better performance than well-known feature selection algorithms while consistently using a smaller number of variables.


Spam and phishing filtering Support vector machines Feature selection Embedded methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: eCrime 2007: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 60–69. ACM, New York (2007)CrossRefGoogle Scholar
  2. 2.
    Ali, S., Smith-Miles, K.A.: A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing 20(1-3), 173–186 (2006)CrossRefGoogle Scholar
  3. 3.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  4. 4.
    Bergholz, A., De Beer, J., Glahn, S., Moens, M.-F., Paass, G., Strobel, S.: New filtering approaches for phishing email. Journal of Computer Security 18(1), 7–35 (2010)Google Scholar
  5. 5.
    Bergholz, A., Chang, J.-H., Paass, G., Reichartz, F., Strobel, S.: Improved phishing detection using model-based features. In: CEAS 2008: Fifth Conference on Email and Anti-Spam, Mountain View, CA, USA (2008)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  7. 7.
    Bradley, P., Mangasarian, O.: Feature selection via concave minimization and support vector machines. In: Int. Conference on Machine Learning, pp. 82–90 (1998)Google Scholar
  8. 8.
    Canu, S., Grandvalet, Y.: Adaptive scaling for feature selection in SVMs. In: Advances in Neural Information Processing Systems 15, pp. 553–560. MIT Press (2002)Google Scholar
  9. 9.
    Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46, 131–159 (2002)MATHCrossRefGoogle Scholar
  10. 10.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  11. 11.
    Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 649–656. ACM, New York (2007)CrossRefGoogle Scholar
  12. 12.
    Goodman, J., Cormack, G.V., Heckerman, D.: Spam and the ongoing battle for the inbox. Commun. ACM 50(2), 24–33 (2007)CrossRefGoogle Scholar
  13. 13.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature extraction, foundations and applications. Springer, Berlin (2006)MATHCrossRefGoogle Scholar
  14. 14.
    Guyon, I., Saffari, A., Dror, G., Cawley, G.: Model selection: Beyond the bayesian frequentist divide. Journal of Machine Learning Research 11, 61–87 (2009)MathSciNetGoogle Scholar
  15. 15.
    L’Huillier, G., Hevia, A., Weber, R., Rios, S.: Latent semantic analysis and keyword extraction for phishing classification. In: ISI 2010: Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 129–131. IEEE, Vancouver (2010)CrossRefGoogle Scholar
  16. 16.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 681–688. ACM, New York (2009)Google Scholar
  17. 17.
    Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Information Sciences 179, 2208–2217 (2009)CrossRefGoogle Scholar
  18. 18.
    Maldonado, S., Weber, R., Basak, J.: Kernel-penalized SVM for feature selection. Information Sciences 181(1), 115–128 (2011)CrossRefGoogle Scholar
  19. 19.
    Inc. McAfee. The carbon footprint of email spam report. Technical report (2008)Google Scholar
  20. 20.
    Nazario, J.: Phishing corpus (2004-2007)Google Scholar
  21. 21.
    Neumann, J., Schnörr, C., Steidl, G.: Combined svm-based feature selection and classification. Machine Learning 61, 129–150 (2005)MATHCrossRefGoogle Scholar
  22. 22.
    Perkins, S., Lacker, K., Theiler, J.: Grafting: Fast incremental feature selection by gradient descent in function space. Journal of Machine Learning research 3, 1333–1356 (2003)MathSciNetMATHGoogle Scholar
  23. 23.
    Rakotomamonjy, A.: Variable selection using SVM-based criteria. Journal of Machine Learning Research 3, 1357–1370 (2003)MathSciNetMATHGoogle Scholar
  24. 24.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  25. 25.
    Tang, Y., He, Y., Krasser, S.: Highly scalable svm modeling with random granulation for spam sender detection. In: Proceedings of the 8th International Conference in Machine Learning and Applications, ICMLA 2008, pp. 659–664. IEEE Computer Society (2008)Google Scholar
  26. 26.
    Tang, Y., Krasser, S., Alperovitch, D., Judge, P.: Spam sender detection with classification modeling on highly imbalanced mail server behavior data. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, AIPR 2008, pp. 174–180. ISRST (2008)Google Scholar
  27. 27.
    Tang, Y., Krasser, S., He, Y., Yang, W., Alperovitch, D.: Support vector machines and random forests modeling for spam senders behavior analysis. In: Proceedings of the Global Telecommunications Conference, GLOBECOM 2008, pp. 2174–2178. IEEE Computer Society (2008)Google Scholar
  28. 28.
    Tang, Y., Krasser, S., Judge, P., Zhang, Y.-Q.: Fast and effective spam sender detection with granular svm on highly imbalanced server behavior data. In: Proceedings of the 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing, COLCOM 2006, pp. 1–6. IEEE Computer Society (2006)Google Scholar
  29. 29.
    Taylor, B., Fingal, D., Aberdeen, D.: The war against spam: A report from the front line. In: NIPS 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security (2007)Google Scholar
  30. 30.
    Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: IEEE Symposium on Security and Privacy 2011, pp. 1–16. IEEE Press (2011)Google Scholar
  31. 31.
    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons (1998)Google Scholar
  32. 32.
    Velásquez, J.D., Rios, S.A., Bassi, A., Yasuda, H., Aoki, T.: Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems 1(1), 53–57 (2005)CrossRefGoogle Scholar
  33. 33.
    Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: The use of zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439–1461 (2003)MATHGoogle Scholar
  34. 34.
    Weston, J., Mukherjee, S., Chapelle, O., Ponntil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems 13, vol. 13 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.School of Engineering and Applied SciencesUniversidad de los AndesLas CondesChile
  2. 2.Groupon, Inc.Palo AltoU.S.A.

Personalised recommendations