Evolutionary Multi-objective Scheduling for Anti-Spam Filtering Throughput Optimization
This paper presents an evolutionary multi-objective optimization problem formulation for the anti-spam filtering problem, addressing both the classification quality criteria (False Positive and False Negative error rates) and email messages classification time (minimization). This approach is compared to single objective problem formulations found in the literature, and its advantages for decision support and flexible/adaptive anti-spam filtering configuration is demonstrated. A study is performed using the Wirebrush4SPAM framework anti-spam filtering and the SpamAssassin email dataset. The NSGA-II evolutionary multi-objective optimization algorithm was applied for the purpose of validating and demonstrating the adoption of this novel approach to the anti-spam filtering optimization problem, formulated from the multi-objective optimization perspective. The results obtained from the experiments demonstrated that this optimization strategy allows the decision maker (anti-spam filtering system administrator) to select among a set of optimal and flexible filter configuration alternatives with respect to classification quality and classification efficiency.
KeywordsRule-based anti-spam systems Scheduling Multi-objective optimization
SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure.
Funding: This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia) and FEDER (European Union). This work was partially supported by the project Platform of integration of intelligent techniques for analysis of biomedical information (TIN2013-47153-C3-3-R) from the Spanish Ministry of Economy and Competitiveness.
- 1.Statista: The statistics portal, Global spam volume as percentage of total e-mail traffic from January 2014 to September 2016, by month (2016). https://www.statista.com/statistics/420391/spam-email-traffic-share/. Accessed 14 Feb 2017
- 2.Digital Marketing Ramblings, 73 Incredible e-mail statistics (2016). http://expandedramblings.com/index.php/email-statistics/. Accessed 14 Feb 2017
- 3.The Apache SpamAssassin Group, The first enterprise open-source spam filter (2003), http://spamassassin.apache.org/. Accessed 14 Feb 2017
- 6.Zhao, J., Basto-Fernandes, V., Jiao, L., Yevseyeva, I., Maulana, A., Li, R., Bäck, T., Tang, K.: Emmerich, Michael T. M.: Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms. Inf. Sci. 367–368, 80–104 (2016). doi: 10.1016/j.ins.2016.05.026 CrossRefGoogle Scholar
- 7.Basto-Fernandes, V., Yevseyeva, I., Méndez, J.R., Zhao, J., Fdez-Riverola, F.: Emmerich, Michael T. M.: A spam filtering multi-objective optimization study covering parsimony maximization and three-way classification. Appl. Soft Comput. 48, 111–123 (2016). doi: 10.1016/j.asoc.2016.06.043 CrossRefGoogle Scholar
- 9.Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, F., Méndez, J.R.: Combining scheduling heuristics to improve e-mail filtering throughput. In: Omatu, S., Malluhi, Qutaibah M., Gonzalez, S.R., Bocewicz, G., Bucciarelli, E., Giulioni, G., Iqba, F. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 373, pp. 235–242. Springer, Cham (2015). doi: 10.1007/978-3-319-19638-1_27 Google Scholar
- 10.Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, F., Méndez, J.R.: Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems. Softw. Pract. Exper. 46(8), 1035–1051 (2016). doi: 10.1002/spe.2343
- 13.IEEE Transactions on Evolutionary Computing – Popular Documents, February 2017. http://ieeexplore.ieee.org/xpl/topAccessedArticles.jsp?punumber=4235&sortType=popular_most_cited_by_papers. Accessed 3 April 2017
- 14.The Apache SpamAssassin Group, How do I get SpamAssassin to run faster? https://wiki.apache.org/spamassassin/FasterPerformance. Accessed 14 Feb 2017
- 15.Pérez-Díaz, N., Ruano-Ordás, D., Fdez-Riverola, F., Méndez, J.R.: Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services. Softw. Pract. Exp. 43(11), 1299–1318 (2013). doi: 10.1002/spe.2135
- 16.The Apache SpamAssassin Group. RescoreMassCheck. https://wiki.apache.org/spamassassin/RescoreMassCheck. Accessed 14 Feb 2017
- 17.Beasley, D.: Possible applications of evolutionary computation. In: Evolutionary Computation 1: Basic Algorithms and Operators, 1st edn., pp. 4–18. Institute of Physics Publishing, Bristol and Philadelphia (2000)Google Scholar
- 18.Resnick, P.: RFC2822: Internet Message Format, Network Working Group. https://www.ietf.org/rfc/rfc2822.txt. Accessed 14 Feb 2017
- 19.The Apache SpamAssassin Group, The Apache SpamAssassin Public Corpus. https://spamassassin.apache.org/publiccorpus/. Accessed 14 Feb 2017
- 20.CSMINING Group, Spam Emails Datasets. http://csmining.org/index.php/spam-email-datasets-.html. Accessed 14 Feb 2017
- 21.TREC Spam. Text REtrieval Conference. http://trec.nist.gov/data/spam.html. Accessed 14 Feb 2017
- 22.Guenter, B.: SPAM archive. http://untroubled.org/spam/. Accessed 14 Feb 2017