Abstract
Internet is flooded every day with a huge number of spam emails. This will lead the internet users to spend a lot of time and effort to manage their mailboxes to distinguish between legitimate and spam emails, which can considerably reduce their productivity. Therefore, in the last decade, many researchers and practitioners proposed different approaches in order to increase the effectiveness and safety of spam filtering models. In this paper, we propose a spam filtering approach consisted of two main stages; feature selection and emails classification. In the first step a Particle Swarm Optimization (PSO) based Wrapper Feature Selection is used to select the best representative set of features to reduce the large number of measured features. In the second stage, a Random Forest spam filtering model is developed using the selected features in the first stage. Experimental results on real-world spam data set show the better performance of the proposed method over other five traditional machine learning approaches from the literature. Furthermore, four cost functions are used to evaluate the proposed spam filtering method. The results reveal that the PSO based Wrapper with Random Forest can effectively be used for spam detection.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Su, M.C., Lo, H.H., Hsu, F.H.: A neural tree and its application to spam e-mail detection. Expert Syst. Appl. 37, 7976–7985 (2010)
Carreras, X., Marquez, L.S., Salgado, J.G.: Boosting trees for anti-spam email filtering. In: Proceedings of 4th International Conference on Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, BG (2001)
Yang, J., Liu, Y., Liu, Z., Zhu, X., Zhang, X.: A new feature selection algorithm based on binomial hypothesis testing for spam filtering. Knowl.-Based Syst. 24, 904–914 (2011)
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)
Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI 2012), pp. 1–7 (2012)
Faris, H., Aljarah, I., Alqatawna, J.: Optimizing feedforward neural networks using Krill Herd algorithm for e-mail spam detection. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Jordan, Amman (2015)
Rodan, A., Faris, H., et al.: Optimizing feedforward neural networks using biogeography based optimization for e-mail spam identification. Int. J. Commun. Netw. Syst. Sci. 9, 19 (2016)
Deshpande, V.P., Erbacher, R.F., Harris, C.: An evaluation of naive bayesian anti-spam filtering techniques. In: IEEE SMC Information Assurance and Security Workshopp, IAW 2007, pp. 333–340. IEEE (2007)
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Inf. Retrieval 6, 49–73 (2003)
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10, 1048–1054 (1999)
Blanco, Á., Ricket, A.M., Martín-Merino, M.: Combining SVM classifiers for email anti-spam filtering. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 903–910. Springer, Heidelberg (2007)
Delany, S.J., Cunningham, P., Tsymbal, A.: A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering. In: FLAIRS Conference, pp. 340–345 (2006)
Al-Shboul, B.A., Hakh, H., Faris, H., Aljarah, I., Alsawalqah, H.: Voting-based classification for e-mail spam detection. J. ICT Res. Appl. 10, 29–42 (2016)
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam, Mountain View, California (2009)
Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: CEAS (2004)
Zitar, R.A., Hamdan, A.: Genetic optimized artificial immune system in spam detection: a review and a model. Artif. Intell. Rev. 40, 305–377 (2013)
Fogel, D.B.: The advantages of evolutionary computation. In: BCEC, pp. 1–11. Citeseer (1997)
Gavrilis, D., Tsoulos, I.G., Dermatas, E.: Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 498–501. Springer, Heidelberg (2006)
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Lai, C.C., Wu, C.H.: Particle swarm optimization-aided feature selection for spam email classification. In: ICICIC, p. 165. IEEE (2007)
Tan, Y.: Particle swarm optimization algorithms inspired by immunity-clonal mechanism and their applications to spam detection. In: Innovations and Developments of Swarm Intelligence Applications, p. 182 (2012)
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, pp. 1942–1948 (1995)
Gandomi, A.H., Alavi, A.H.: Krill Herd: a new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul. 17, 4831–4845 (2012)
Aljarah, I., Ludwig, S.A.: A new clustering approach based on glowworm swarm optimization. In: 2013 IEEE Congress on Evolutionary Computation. Institute of Electrical & Electronics Engineers (IEEE) (2013)
Moraglio, A., Chio, C., Togelius, J., Poli, R.: Geometric particle swarm optimization. J. Artif. Evol. Appl. 2008, 11 (2008)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Guo, L., Chehata, N., Mallet, C., Boukir, S.: Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J. Photogrammetry Remote Sens. 66, 56–66 (2011)
Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., Adwan, O.: Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int. J. Commun. Netw. Syst. Sci. 8, 118 (2015)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)
Burez, J., Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36, 4626–4636 (2009)
Wang, S., Tang, K., Yao, X.: Diversity exploration and negative correlation learning on imbalanced data sets. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 3259–3266. IEEE (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Faris, H., Aljarah, I., Al-Shboul, B. (2016). A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-45243-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45242-5
Online ISBN: 978-3-319-45243-2
eBook Packages: Computer ScienceComputer Science (R0)