Evolutionary Multi-objective Scheduling for Anti-Spam Filtering Throughput Optimization

  • David Ruano-Ordás
  • Vitor Basto-Fernandes
  • Iryna Yevseyeva
  • José Ramón Méndez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10334)

Abstract

This paper presents an evolutionary multi-objective optimization problem formulation for the anti-spam filtering problem, addressing both the classification quality criteria (False Positive and False Negative error rates) and email messages classification time (minimization). This approach is compared to single objective problem formulations found in the literature, and its advantages for decision support and flexible/adaptive anti-spam filtering configuration is demonstrated. A study is performed using the Wirebrush4SPAM framework anti-spam filtering and the SpamAssassin email dataset. The NSGA-II evolutionary multi-objective optimization algorithm was applied for the purpose of validating and demonstrating the adoption of this novel approach to the anti-spam filtering optimization problem, formulated from the multi-objective optimization perspective. The results obtained from the experiments demonstrated that this optimization strategy allows the decision maker (anti-spam filtering system administrator) to select among a set of optimal and flexible filter configuration alternatives with respect to classification quality and classification efficiency.

Keywords

Rule-based anti-spam systems Scheduling Multi-objective optimization 

Notes

Acknowledgements

SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure.

Funding: This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia) and FEDER (European Union). This work was partially supported by the project Platform of integration of intelligent techniques for analysis of biomedical information (TIN2013-47153-C3-3-R) from the Spanish Ministry of Economy and Competitiveness.

References

  1. 1.
    Statista: The statistics portal, Global spam volume as percentage of total e-mail traffic from January 2014 to September 2016, by month (2016). https://www.statista.com/statistics/420391/spam-email-traffic-share/. Accessed 14 Feb 2017
  2. 2.
    Digital Marketing Ramblings, 73 Incredible e-mail statistics (2016). http://expandedramblings.com/index.php/email-statistics/. Accessed 14 Feb 2017
  3. 3.
    The Apache SpamAssassin Group, The first enterprise open-source spam filter (2003), http://spamassassin.apache.org/. Accessed 14 Feb 2017
  4. 4.
    Méndez, J.R., Reboiro-Jato, M., Díaz, F., Díaz, E., Fdez-Riverola, F.: Grindstone4Spam: an optimization toolkit for boosting e-mail classification. J. Syst. Softw. 85(12), 2909–2920 (2012). doi:10.1016/j.jss.2012.06.027 CrossRefGoogle Scholar
  5. 5.
    Yevseyeva, I., Basto-Fernandes, V., Ruano-Ordás, D., Méndez, J.R.: Optimising anti-spam filters with evolutionary algorithms. Expert Syst. Appl. 40(10), 4010–4021 (2013). doi:10.1016/j.eswa.2013.01.008 CrossRefGoogle Scholar
  6. 6.
    Zhao, J., Basto-Fernandes, V., Jiao, L., Yevseyeva, I., Maulana, A., Li, R., Bäck, T., Tang, K.: Emmerich, Michael T. M.: Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms. Inf. Sci. 367–368, 80–104 (2016). doi:10.1016/j.ins.2016.05.026 CrossRefGoogle Scholar
  7. 7.
    Basto-Fernandes, V., Yevseyeva, I., Méndez, J.R., Zhao, J., Fdez-Riverola, F.: Emmerich, Michael T. M.: A spam filtering multi-objective optimization study covering parsimony maximization and three-way classification. Appl. Soft Comput. 48, 111–123 (2016). doi:10.1016/j.asoc.2016.06.043 CrossRefGoogle Scholar
  8. 8.
    Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, J., Méndez, J.R.: Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks. J. Syst. Softw. 86(12), 3151–3161 (2013). doi:10.1016/j.jss.2013.07.036 CrossRefGoogle Scholar
  9. 9.
    Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, F., Méndez, J.R.: Combining scheduling heuristics to improve e-mail filtering throughput. In: Omatu, S., Malluhi, Qutaibah M., Gonzalez, S.R., Bocewicz, G., Bucciarelli, E., Giulioni, G., Iqba, F. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 373, pp. 235–242. Springer, Cham (2015). doi:10.1007/978-3-319-19638-1_27 Google Scholar
  10. 10.
    Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, F., Méndez, J.R.: Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems. Softw. Pract. Exper. 46(8), 1035–1051 (2016). doi:10.1002/spe.2343
  11. 11.
    Ruano-Ordás, D., Fdez-Glez, J., Fdez-Riverola, F., Basto-Fernandes, V., Méndez, J.R.: RuleSIM: a toolkit for simulating the operation and improving throughput of rule-based spam filters. Softw. Pract. Exp. 46, 1091–1108 (2016). doi:10.1002/spe.2342 CrossRefGoogle Scholar
  12. 12.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002). doi:10.1109/4235.996017 CrossRefGoogle Scholar
  13. 13.
    IEEE Transactions on Evolutionary Computing – Popular Documents, February 2017. http://ieeexplore.ieee.org/xpl/topAccessedArticles.jsp?punumber=4235&sortType=popular_most_cited_by_papers. Accessed 3 April 2017
  14. 14.
    The Apache SpamAssassin Group, How do I get SpamAssassin to run faster? https://wiki.apache.org/spamassassin/FasterPerformance. Accessed 14 Feb 2017
  15. 15.
    Pérez-Díaz, N., Ruano-Ordás, D., Fdez-Riverola, F., Méndez, J.R.: Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services. Softw. Pract. Exp. 43(11), 1299–1318 (2013). doi:10.1002/spe.2135
  16. 16.
    The Apache SpamAssassin Group. RescoreMassCheck. https://wiki.apache.org/spamassassin/RescoreMassCheck. Accessed 14 Feb 2017
  17. 17.
    Beasley, D.: Possible applications of evolutionary computation. In: Evolutionary Computation 1: Basic Algorithms and Operators, 1st edn., pp. 4–18. Institute of Physics Publishing, Bristol and Philadelphia (2000)Google Scholar
  18. 18.
    Resnick, P.: RFC2822: Internet Message Format, Network Working Group. https://www.ietf.org/rfc/rfc2822.txt. Accessed 14 Feb 2017
  19. 19.
    The Apache SpamAssassin Group, The Apache SpamAssassin Public Corpus. https://spamassassin.apache.org/publiccorpus/. Accessed 14 Feb 2017
  20. 20.
    CSMINING Group, Spam Emails Datasets. http://csmining.org/index.php/spam-email-datasets-.html. Accessed 14 Feb 2017
  21. 21.
    TREC Spam. Text REtrieval Conference. http://trec.nist.gov/data/spam.html. Accessed 14 Feb 2017
  22. 22.
    Guenter, B.: SPAM archive. http://untroubled.org/spam/. Accessed 14 Feb 2017
  23. 23.
    Durillo, J.J., Nebro, A.J.: jMetal: a java framework for multi-objective optimization. Adv. Eng. Softw. 42(10), 760–771 (2011). doi:10.1016/j.advengsoft.2011.05.014 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • David Ruano-Ordás
    • 1
    • 2
  • Vitor Basto-Fernandes
    • 3
    • 4
  • Iryna Yevseyeva
    • 5
  • José Ramón Méndez
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of Vigo, ESEIOurenseSpain
  2. 2.Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia)VigoSpain
  3. 3.Instituto Universitário de Lisboa (ISCTE-IUL)University Institute of Lisbon, ISTAR-IULLisbonPortugal
  4. 4.School of Technology and Management, Computer Science and Communications Research CentrePolytechnic Institute of LeiriaLeiriaPortugal
  5. 5.School of Computer Science and Informatics, Faculty of TechnologyDe Montfort UniversityLeicesterUK

Personalised recommendations