Abstract
The Covid-19 pandemic has been a driving force for a substantial increase in online activity and transactions across the globe. As a consequence, cyber-attacks, particularly those leveraging email as the preferred attack vector, have also increased exponentially since Q1 2020. Despite this, email remains a popular communication tool. Previously, in an effort to reduce the amount of spam entering a users inbox, many email providers started to incorporate spam filters into their products. However, many commercial spam filters rely on a human to train the filter, leaving a margin of risk if sufficient training has not occurred. In addition, knowing this, hackers employ more targeted and nuanced obfuscation methods to bypass in-built spam filters. In response to this continued problem, there is a growing body of research on the use of machine learning techniques for spam filtering. In many cases, detection results have shown great promise, but often still rely on human input to classify training datasets. In this study, we explore specifically the use of deep learning as a method of reducing human input required for spam detection. First, we evaluate the efficacy of popular spam detection methods/tools/techniques (freeware). Next, we narrow down machine learning techniques to select the appropriate method for our dataset. This was then compared with the accuracy of freeware spam detection tools to present our results. Our results showed that our deep learning model, based on simple word embedding and global max pooling (SWEM-max) had higher accuracy (98.41%) than both Thunderbird (95%) and Mailwasher (92%) which are based on Bayesian spam filtering. Finally, we postulate whether this improvement is enough to accept the removal of human input in spam email detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abroshan, H., Devos, J., Poels, G., Laermans, E.: Phishing happens beyond technology: the effects of human behaviors and demographics on each step of a phishing process. IEEE Access 9, 44928–44949 (2021)
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 60–69 (2007)
Abutair, H., Belghith, A., AlAhmadi, S.: CBR-PDS: a case-based reasoning phishing detection system. J. Ambient. Intell. Humaniz. Comput. 10(7), 2593–2606 (2019)
Akdemir, N., Lawless, C.J.: Exploring the human factor in cyber-enabled and cyber-dependent crime victimisation: a lifestyle routine activities approach. Internet Res. (2020)
Alghoul, A., Al Ajrami, S., Al Jarousha, G., Harb, G., Abu-Naser, S.S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (IJAER) 2(11), 8–14 (2018)
Alkahtani, H.S., Gardner-Stephen, P., Goodwin, R.: A taxonomy of email spam filters. In: Proceedings of the 12th International Arab Conference on Information Technology (ACIT 2011), pp. 351–356 (2011)
Alrwais, S., Yuan, K., Alowaisheq, E., Li, Z., Wang, X.: Understanding the dark side of domain parking. In: 23rd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 14), pp. 207–222 (2014)
Alshaikh, M., Naseer, H., Ahmad, A., Maynard, S.B.: Toward sustainable behaviour change: an approach for cyber security education training and awareness (2019)
Baadel, S., Lu, J.: Data analytics: intelligent anti-phishing techniques based on machine learning. J. Inf. Knowl. Manage. 18(01), 1950005 (2019)
Banday, M.T., Jan, T.R.: Effectiveness and limitations of statistical spam filters. arXiv preprint arXiv:0910.2540 (2009)
Bhardwaj, A., Sapra, V., Kumar, A., Kumar, N., Arthi, S.: Why is phishing still successful? Comput. Fraud Secur. 2020(9), 15–19 (2020)
Boyle, P., Shepherd, L.A.: Mailtrout: a machine learning browser extension for detecting phishing emails. In: 33rd British Human Computer Interaction Conference: Post-Pandemic HCI-Living digitally. Association for Computing Machinery (ACM) (2021)
Caldwell, T.: Training-the weakest link. Comput. Fraud Secur. 2012(9), 8–14 (2012)
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM Workshop on Digital Identity Management, pp. 51–60 (2008)
Chen, C., et al.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans. Comput. Social Syst. 2(3), 65–76 (2015)
Christina, V., Karpagavalli, S., Suganya, G.: Email spam filtering using supervised machine learning techniques. Int. J. Comput. Sci. Eng. (IJCSE) 2(09), 3126–3129 (2010)
Cveticanin, N.: (2021). https://dataprot.net/
Dablain, D., Krawczyk, B., Chawla, N.V.: Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2022)
Dada, E.G., Bassi, J.S., Chiroma, H., Adetunmbi, A.O., Ajibuwa, O.E., et al.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802 (2019)
Desolda, G., Ferro, L.S., Marrella, A., Catarci, T., Costabile, M.F.: Human factors in phishing attacks: a systematic literature review. ACM Comput. Surv. (CSUR) 54(8), 1–35 (2021)
Dhamija, R., Tygar, D.: Hearst. m. 2006. why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 22–27 (2006)
Dhanaraj, S., Karthikeyani, V.: A study on e-mail image spam filtering techniques. In: 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 49–55. IEEE (2013)
Dudley, J.: Improving the performance of heuristic spam detection using a multi-objective genetic algorithm. The University of Western Australia, School of Computer Science and Software Engineering (2007)
Fahmy, H.M., Ghoneim, S.A.: Phishblock: A hybrid anti-phishing tool. In: 2011 International Conference on Communications, Computing and Control Applications (CCCA), pp. 1–5. IEEE (2011)
Fan, W., Kevin, L., Rong, R.: Social engineering: IE based model of human weakness for attack and defense investigations. IJ Comput. Netw. Inf. Secur. 9(1), 1–11 (2017)
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, pp. 649–656 (2007)
Gangavarapu, T., Jaidhar, C.D., Chanduka, B.: Applicability of machine learning in spam and phishing email filtering: review and approaches. Artif. Intell. Rev. 53(7), 5019–5081 (2020). https://doi.org/10.1007/s10462-020-09814-9
Guo, K.H., Yuan, Y., Archer, N.P., Connelly, C.E.: Understanding nonmalicious security violations in the workplace: a composite behavior model. J. Manag. Inf. Syst. 28(2), 203–236 (2011)
Gupta, B.B., Arachchilage, N.A.G., Psannis, K.E.: Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommun. Syst. 67(2), 247–267 (2017). https://doi.org/10.1007/s11235-017-0334-z
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Hao, S., Syed, N.A., Feamster, N., Gray, A.G., Krasser, S.: Detecting spammers with snare: spatio-temporal network-level automatic reputation engine. In: USENIX security symposium, vol. 9 (2009)
Harinahalli Lokesh, G., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)
Hayati, P., Potdar, V., Talevski, A., Firoozeh, N., Sarencheh, S., Yeganeh, E.: Definition of spam 2.0: new spamming boom, pp. 580–584 (2010). https://doi.org/10.1109/DEST.2010.5610590
Heron, S.: Technologies for spam detection. Netw. Secur. 2009(1), 11–15 (2009)
Hill, J.: (2021). https://abnormalsecurity.com/blog/how-to-stop-email-spoofing
Irwin, L.: (2020). https://www.itgovernance.eu/blog/en/the-5-most-common-types-of-phishing-attack
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Jensen, M.L., Dinger, M., Wright, R.T., Thatcher, J.B.: Training to mitigate phishing attacks using mindfulness techniques. J. Manag. Inf. Syst. 34(2), 597–626 (2017)
Khan, S.A., Khan, W., Hussain, A.: Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 301–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_26
Kołcz, A., Chowdhury, A.: Hardening fingerprinting by context. In: CEAS 2007 (2007)
Kołcz, A., Chowdhury, A., Alspector, J.: The impact of feature selection on signature-driven spam detection. In: Proceedings of the 1st Conference on Email and Anti-Spam (CEAS-2004) (2004)
Kumar, N., Sonowal, S., et al.: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113. IEEE (2020)
Lam, H.Y., Yeung, D.Y.: A learning approach to spam detection based on social networks. Ph.D. thesis, Hong Kong University of Science and Technology (2007)
Liu, X., et al.: CPSFS: a credible personalized spam filtering scheme by crowdsourcing. Wireless Communications and Mobile Computing 2017 (2017)
Luo, C., Xia, C., Shao, H.: Training high quality spam-detection models using weak labels (2020)
Mansoor, R., Jayasinghe, N.D., Muslam, M.M.A.: A comprehensive review on email spam classification using machine learning algorithms. In: 2021 International Conference on Information Networking (ICOIN), pp. 327–332. IEEE (2021)
Mohamed, S.A.E.: Efficient spam filtering system based on smart cooperative subjective and objective methods (2013)
Mozilla Support: Thunderbird and junk/spam messages (2021). https://support.mozilla.org/en-US/kb/thunderbird-and-junk-spam-messages. Accessed on 19 Jan 2021
Najadat, H., Hmeidi, I.: Web spam detection using machine learningin specific domain features (2008)
Niakanlahiji, A., Chu, B.T., Al-Shaer, E.: Phishmon: a machine learning framework for detecting phishing webpages. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 220–225. IEEE (2018)
Nicho, M., Fakhry, H., Egbue, U.: Evaluating user vulnerabilities vs phisher skills in spear phishing. Int. J. Comput. Sci. Inform. Syst. 13, 93–108 (2018)
Paans, R., Herschberg, I.: Computer security: the long road ahead. Comput. Secur. 6(5), 403–416 (1987)
Patidar, V., Singh, D., Singh, A.: A novel technique of email classification for spam detection. Int. J. Appl. Inf. Syst. 5(10), 15–19 (2013)
Patil, R.C., Patil, D.: Web spam detection using SVM classifier. In: 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2015)
Rahman, S.S.M.M., Islam, T., Jabiullah, M.I.: Phishstack: evaluation of stacked generalization in phishing URLs detection. Proc. Comput. Sci. 167, 2410–2418 (2020)
Ramachandran, A., Dagon, D., Feamster, N.: Can DNS-based blacklists keep up with bots? In: CEAS (2006)
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning, pp. 4334–4343. PMLR (2018)
Richardson, M.D., Lemoine, P.A., Stephens, W.E., Waller, R.E.: Planning for cyber security in schools: the human factor. Educat. Plann. 27(2), 23–39 (2020)
Roman, R., Zhou, J., Lopez, J.: An anti-spam scheme using pre-challenges. Comput. Commun. 29(15), 2739–2749 (2006)
Rosenthal, M.: (2021). https://www.tessian.com/blog/phishing-statistics-2020/
Seewald, A.K.: Combining Bayesian and rule score learning: automated tuning for spam as sassin. Intelligent Data Analysis. Technical report, TR-2004-11 Austrian Research Institute for Artificial Intelligence, Vienna, Austria (2004)
Sendpulse: (2020). https://sendpulse.com/support/glossary/phishing
Shakela, V., Jazri, H.: Assessment of spear phishing user experience and awareness: an evaluation framework model of spear phishing exposure level (SPEL) in the Namibian financial industry. In: 2019 international conference on advances in big data, computing and data communication systems (icABCD), pp. 1–5. IEEE (2019)
Shen, D., et al.: Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms (2018). https://doi.org/10.18653/v1/P18-1041
Sonowal, G., Kuppusamy, K.: Mmsphid: a phoneme based phishing verification model for persons with visual impairments. Inf. Comput Secur. 26(5), 613–636 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Statista (2021). https://www.statista.com/statistics/255080/number-of-e-mail-users-worldwide
Tessian (2021). https://www.tessian.com/blog/covid-19-real-life-examples-of-opportunistic-phishing-emails-2/
Tong, Z., Weiss, S.M.: The Handbook of Data Mining. Lawrence Erlbaum Assoociates, New Jersey (2003)
Wash, R., Cooper, M.M.: Who provides phishing training? facts, stories, and people like me. In: Proceedings of the 2018 Chi Conference on Human Factors in Computing Systems, pp. 1–12 (2018)
Wu, C.H.: Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl. 36(3), 4321–4330 (2009)
Zafar, H., Randolph, A., Gupta, S., Hollingsworth, C.: Traditional seta no more: investigating the intersection between cybersecurity and cognitive neuroscience. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nicho, M., Majdani, F., McDermott, C.D. (2022). Replacing Human Input in Spam Email Detection Using Deep Learning. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-05643-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05642-0
Online ISBN: 978-3-031-05643-7
eBook Packages: Computer ScienceComputer Science (R0)