A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering

Faris, Hossam; Aljarah, Ibrahim; Al-Shboul, Bashar

doi:10.1007/978-3-319-45243-2_46

A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering

Hossam Faris¹⁷,
Ibrahim Aljarah^17,18 &
Bashar Al-Shboul^17,18

Conference paper
First Online: 20 September 2016

1417 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9875))

Abstract

Internet is flooded every day with a huge number of spam emails. This will lead the internet users to spend a lot of time and effort to manage their mailboxes to distinguish between legitimate and spam emails, which can considerably reduce their productivity. Therefore, in the last decade, many researchers and practitioners proposed different approaches in order to increase the effectiveness and safety of spam filtering models. In this paper, we propose a spam filtering approach consisted of two main stages; feature selection and emails classification. In the first step a Particle Swarm Optimization (PSO) based Wrapper Feature Selection is used to select the best representative set of features to reduce the large number of measured features. In the second stage, a Random Forest spam filtering model is developed using the selected features in the first stage. Experimental results on real-world spam data set show the better performance of the proposed method over other five traditional machine learning approaches from the literature. Furthermore, four cost functions are used to evaluate the proposed spam filtering method. The results reveal that the PSO based Wrapper with Random Forest can effectively be used for spam detection.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://spamassassin.apache.org/publiccorpus/.

References

Su, M.C., Lo, H.H., Hsu, F.H.: A neural tree and its application to spam e-mail detection. Expert Syst. Appl. 37, 7976–7985 (2010)
Article Google Scholar
Carreras, X., Marquez, L.S., Salgado, J.G.: Boosting trees for anti-spam email filtering. In: Proceedings of 4th International Conference on Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, BG (2001)
Google Scholar
Yang, J., Liu, Y., Liu, Z., Zhu, X., Zhang, X.: A new feature selection algorithm based on binomial hypothesis testing for spam filtering. Knowl.-Based Syst. 24, 904–914 (2011)
Article Google Scholar
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)
Article Google Scholar
Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI 2012), pp. 1–7 (2012)
Google Scholar
Faris, H., Aljarah, I., Alqatawna, J.: Optimizing feedforward neural networks using Krill Herd algorithm for e-mail spam detection. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), Jordan, Amman (2015)
Google Scholar
Rodan, A., Faris, H., et al.: Optimizing feedforward neural networks using biogeography based optimization for e-mail spam identification. Int. J. Commun. Netw. Syst. Sci. 9, 19 (2016)
Google Scholar
Deshpande, V.P., Erbacher, R.F., Harris, C.: An evaluation of naive bayesian anti-spam filtering techniques. In: IEEE SMC Information Assurance and Security Workshopp, IAW 2007, pp. 333–340. IEEE (2007)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Inf. Retrieval 6, 49–73 (2003)
Article Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10, 1048–1054 (1999)
Article Google Scholar
Blanco, Á., Ricket, A.M., Martín-Merino, M.: Combining SVM classifiers for email anti-spam filtering. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 903–910. Springer, Heidelberg (2007)
Chapter Google Scholar
Delany, S.J., Cunningham, P., Tsymbal, A.: A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering. In: FLAIRS Conference, pp. 340–345 (2006)
Google Scholar
Al-Shboul, B.A., Hakh, H., Faris, H., Aljarah, I., Alsawalqah, H.: Voting-based classification for e-mail spam detection. J. ICT Res. Appl. 10, 29–42 (2016)
Article Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
MathSciNet MATH Google Scholar
DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam, Mountain View, California (2009)
Google Scholar
Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: CEAS (2004)
Google Scholar
Zitar, R.A., Hamdan, A.: Genetic optimized artificial immune system in spam detection: a review and a model. Artif. Intell. Rev. 40, 305–377 (2013)
Article Google Scholar
Fogel, D.B.: The advantages of evolutionary computation. In: BCEC, pp. 1–11. Citeseer (1997)
Google Scholar
Gavrilis, D., Tsoulos, I.G., Dermatas, E.: Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 498–501. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Article Google Scholar
Lai, C.C., Wu, C.H.: Particle swarm optimization-aided feature selection for spam email classification. In: ICICIC, p. 165. IEEE (2007)
Google Scholar
Tan, Y.: Particle swarm optimization algorithms inspired by immunity-clonal mechanism and their applications to spam detection. In: Innovations and Developments of Swarm Intelligence Applications, p. 182 (2012)
Google Scholar
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, pp. 1942–1948 (1995)
Google Scholar
Gandomi, A.H., Alavi, A.H.: Krill Herd: a new bio-inspired optimization algorithm. Commun. Nonlinear Sci. Numer. Simul. 17, 4831–4845 (2012)
Article MathSciNet MATH Google Scholar
Aljarah, I., Ludwig, S.A.: A new clustering approach based on glowworm swarm optimization. In: 2013 IEEE Congress on Evolutionary Computation. Institute of Electrical & Electronics Engineers (IEEE) (2013)
Google Scholar
Moraglio, A., Chio, C., Togelius, J., Poli, R.: Geometric particle swarm optimization. J. Artif. Evol. Appl. 2008, 11 (2008)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Guo, L., Chehata, N., Mallet, C., Boukir, S.: Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J. Photogrammetry Remote Sens. 66, 56–66 (2011)
Article Google Scholar
Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M., Adwan, O.: Improving knowledge based spam detection methods: the effect of malicious related features in imbalance data distribution. Int. J. Commun. Netw. Syst. Sci. 8, 118 (2015)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Burez, J., Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36, 4626–4636 (2009)
Article Google Scholar
Wang, S., Tang, K., Yao, X.: Diversity exploration and negative correlation learning on imbalanced data sets. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 3259–3266. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Business Information Technology Department, King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
Hossam Faris, Ibrahim Aljarah & Bashar Al-Shboul
The University of Jordan, Amman, Jordan
Ibrahim Aljarah & Bashar Al-Shboul

Authors

Hossam Faris
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Aljarah
View author publications
You can also search for this author in PubMed Google Scholar
Bashar Al-Shboul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibrahim Aljarah .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc-Thanh Nguyen
Aristotle University of Thessaloniki, Thessaloniki, Greece
Lazaros Iliadis
Department of Forestry and Management, Democritus University of Thrace, Orestiada, Thrace, Greece
Yannis Manolopoulos
Wrocław University of Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faris, H., Aljarah, I., Al-Shboul, B. (2016). A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-45243-2_46
Published: 20 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45242-5
Online ISBN: 978-3-319-45243-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics