Skip to main content

A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9875))

Abstract

Internet is flooded every day with a huge number of spam emails. This will lead the internet users to spend a lot of time and effort to manage their mailboxes to distinguish between legitimate and spam emails, which can considerably reduce their productivity. Therefore, in the last decade, many researchers and practitioners proposed different approaches in order to increase the effectiveness and safety of spam filtering models. In this paper, we propose a spam filtering approach consisted of two main stages; feature selection and emails classification. In the first step a Particle Swarm Optimization (PSO) based Wrapper Feature Selection is used to select the best representative set of features to reduce the large number of measured features. In the second stage, a Random Forest spam filtering model is developed using the selected features in the first stage. Experimental results on real-world spam data set show the better performance of the proposed method over other five traditional machine learning approaches from the literature. Furthermore, four cost functions are used to evaluate the proposed spam filtering method. The results reveal that the PSO based Wrapper with Random Forest can effectively be used for spam detection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://spamassassin.apache.org/publiccorpus/.

References

  1. Su, M.C., Lo, H.H., Hsu, F.H.: A neural tree and its application to spam e-mail detection. Expert Syst. Appl. 37, 7976–7985 (2010)

    Article  Google Scholar 

  2. Carreras, X., Marquez, L.S., Salgado, J.G.: Boosting trees for anti-spam email filtering. In: Proceedings of 4th International Conference on Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, BG (2001)

    Google Scholar 

  3. Yang, J., Liu, Y., Liu, Z., Zhu, X., Zhang, X.: A new feature selection algorithm based on binomial hypothesis testing for spam filtering. Knowl.-Based Syst. 24, 904–914 (2011)

    Article  Google Scholar 

  4. Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)

    Article  Google Scholar 

  5. Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI 2012), pp. 1–7 (2012)

    Google Scholar 

  6. Faris, H., Aljarah, I., Alqatawna, J.: Optimizing feedforward neural networks using Krill Herd algorithm for e-mail spam detection. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Jordan, Amman (2015)

    Google Scholar 

  7. Rodan, A., Faris, H., et al.: Optimizing feedforward neural networks using biogeography based optimization for e-mail spam identification. Int. J. Commun. Netw. Syst. Sci. 9, 19 (2016)

    Google Scholar 

  8. Deshpande, V.P., Erbacher, R.F., Harris, C.: An evaluation of naive bayesian anti-spam filtering techniques. In: IEEE SMC Information Assurance and Security Workshopp, IAW 2007, pp. 333–340. IEEE (2007)

    Google Scholar 

  9. Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Inf. Retrieval 6, 49–73 (2003)

    Article  Google Scholar 

  10. Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10, 1048–1054 (1999)

    Article  Google Scholar 

  11. Blanco, Á., Ricket, A.M., Martín-Merino, M.: Combining SVM classifiers for email anti-spam filtering. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 903–910. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Delany, S.J., Cunningham, P., Tsymbal, A.: A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering. In: FLAIRS Conference, pp. 340–345 (2006)

    Google Scholar 

  13. Al-Shboul, B.A., Hakh, H., Faris, H., Aljarah, I., Alsawalqah, H.: Voting-based classification for e-mail spam detection. J. ICT Res. Appl. 10, 29–42 (2016)

    Article  Google Scholar 

  14. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  15. DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam, Mountain View, California (2009)

    Google Scholar 

  16. Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: CEAS (2004)

    Google Scholar 

  17. Zitar, R.A., Hamdan, A.: Genetic optimized artificial immune system in spam detection: a review and a model. Artif. Intell. Rev. 40, 305–377 (2013)

    Article  Google Scholar 

  18. Fogel, D.B.: The advantages of evolutionary computation. In: BCEC, pp. 1–11. Citeseer (1997)

    Google Scholar 

  19. Gavrilis, D., Tsoulos, I.G., Dermatas, E.: Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 498–501. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)

    Article  Google Scholar 

  21. Lai, C.C., Wu, C.H.: Particle swarm optimization-aided feature selection for spam email classification. In: ICICIC, p. 165. IEEE (2007)

    Google Scholar 

  22. Tan, Y.: Particle swarm optimization algorithms inspired by immunity-clonal mechanism and their applications to spam detection. In: Innovations and Developments of Swarm Intelligence Applications, p. 182 (2012)

    Google Scholar 

  23. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, pp. 1942–1948 (1995)

    Google Scholar 

  24. Gandomi, A.H., Alavi, A.H.: Krill Herd: a new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul. 17, 4831–4845 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Aljarah, I., Ludwig, S.A.: A new clustering approach based on glowworm swarm optimization. In: 2013 IEEE Congress on Evolutionary Computation. Institute of Electrical & Electronics Engineers (IEEE) (2013)

    Google Scholar 

  26. Moraglio, A., Chio, C., Togelius, J., Poli, R.: Geometric particle swarm optimization. J. Artif. Evol. Appl. 2008, 11 (2008)

    Google Scholar 

  27. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  28. Guo, L., Chehata, N., Mallet, C., Boukir, S.: Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J. Photogrammetry Remote Sens. 66, 56–66 (2011)

    Article  Google Scholar 

  29. Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., Adwan, O.: Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int. J. Commun. Netw. Syst. Sci. 8, 118 (2015)

    Google Scholar 

  30. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  31. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  32. Burez, J., Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36, 4626–4636 (2009)

    Article  Google Scholar 

  33. Wang, S., Tang, K., Yao, X.: Diversity exploration and negative correlation learning on imbalanced data sets. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 3259–3266. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Aljarah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Faris, H., Aljarah, I., Al-Shboul, B. (2016). A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45243-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45242-5

  • Online ISBN: 978-3-319-45243-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics