Replacing Human Input in Spam Email Detection Using Deep Learning

Nicho, Mathew; Majdani, Farzan; McDermott, Christopher D.

doi:10.1007/978-3-031-05643-7_25

Mathew Nicho¹⁰,
Farzan Majdani¹¹ &
Christopher D. McDermott¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13336))

Included in the following conference series:

International Conference on Human-Computer Interaction

2296 Accesses

Abstract

The Covid-19 pandemic has been a driving force for a substantial increase in online activity and transactions across the globe. As a consequence, cyber-attacks, particularly those leveraging email as the preferred attack vector, have also increased exponentially since Q1 2020. Despite this, email remains a popular communication tool. Previously, in an effort to reduce the amount of spam entering a users inbox, many email providers started to incorporate spam filters into their products. However, many commercial spam filters rely on a human to train the filter, leaving a margin of risk if sufficient training has not occurred. In addition, knowing this, hackers employ more targeted and nuanced obfuscation methods to bypass in-built spam filters. In response to this continued problem, there is a growing body of research on the use of machine learning techniques for spam filtering. In many cases, detection results have shown great promise, but often still rely on human input to classify training datasets. In this study, we explore specifically the use of deep learning as a method of reducing human input required for spam detection. First, we evaluate the efficacy of popular spam detection methods/tools/techniques (freeware). Next, we narrow down machine learning techniques to select the appropriate method for our dataset. This was then compared with the accuracy of freeware spam detection tools to present our results. Our results showed that our deep learning model, based on simple word embedding and global max pooling (SWEM-max) had higher accuracy (98.41%) than both Thunderbird (95%) and Mailwasher (92%) which are based on Bayesian spam filtering. Finally, we postulate whether this improvement is enough to accept the removal of human input in spam email detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abroshan, H., Devos, J., Poels, G., Laermans, E.: Phishing happens beyond technology: the effects of human behaviors and demographics on each step of a phishing process. IEEE Access 9, 44928–44949 (2021)
Article Google Scholar
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 60–69 (2007)
Google Scholar
Abutair, H., Belghith, A., AlAhmadi, S.: CBR-PDS: a case-based reasoning phishing detection system. J. Ambient. Intell. Humaniz. Comput. 10(7), 2593–2606 (2019)
Article Google Scholar
Akdemir, N., Lawless, C.J.: Exploring the human factor in cyber-enabled and cyber-dependent crime victimisation: a lifestyle routine activities approach. Internet Res. (2020)
Google Scholar
Alghoul, A., Al Ajrami, S., Al Jarousha, G., Harb, G., Abu-Naser, S.S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (IJAER) 2(11), 8–14 (2018)
Google Scholar
Alkahtani, H.S., Gardner-Stephen, P., Goodwin, R.: A taxonomy of email spam filters. In: Proceedings of the 12th International Arab Conference on Information Technology (ACIT 2011), pp. 351–356 (2011)
Google Scholar
Alrwais, S., Yuan, K., Alowaisheq, E., Li, Z., Wang, X.: Understanding the dark side of domain parking. In: 23rd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 14), pp. 207–222 (2014)
Google Scholar
Alshaikh, M., Naseer, H., Ahmad, A., Maynard, S.B.: Toward sustainable behaviour change: an approach for cyber security education training and awareness (2019)
Google Scholar
Baadel, S., Lu, J.: Data analytics: intelligent anti-phishing techniques based on machine learning. J. Inf. Knowl. Manage. 18(01), 1950005 (2019)
Article Google Scholar
Banday, M.T., Jan, T.R.: Effectiveness and limitations of statistical spam filters. arXiv preprint arXiv:0910.2540 (2009)
Bhardwaj, A., Sapra, V., Kumar, A., Kumar, N., Arthi, S.: Why is phishing still successful? Comput. Fraud Secur. 2020(9), 15–19 (2020)
Article Google Scholar
Boyle, P., Shepherd, L.A.: Mailtrout: a machine learning browser extension for detecting phishing emails. In: 33rd British Human Computer Interaction Conference: Post-Pandemic HCI-Living digitally. Association for Computing Machinery (ACM) (2021)
Google Scholar
Caldwell, T.: Training-the weakest link. Comput. Fraud Secur. 2012(9), 8–14 (2012)
Article Google Scholar
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM Workshop on Digital Identity Management, pp. 51–60 (2008)
Google Scholar
Chen, C., et al.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans. Comput. Social Syst. 2(3), 65–76 (2015)
Article Google Scholar
Christina, V., Karpagavalli, S., Suganya, G.: Email spam filtering using supervised machine learning techniques. Int. J. Comput. Sci. Eng. (IJCSE) 2(09), 3126–3129 (2010)
Google Scholar
Cveticanin, N.: (2021). https://dataprot.net/
Dablain, D., Krawczyk, B., Chawla, N.V.: Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2022)
Google Scholar
Dada, E.G., Bassi, J.S., Chiroma, H., Adetunmbi, A.O., Ajibuwa, O.E., et al.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802 (2019)
Article Google Scholar
Desolda, G., Ferro, L.S., Marrella, A., Catarci, T., Costabile, M.F.: Human factors in phishing attacks: a systematic literature review. ACM Comput. Surv. (CSUR) 54(8), 1–35 (2021)
Article Google Scholar
Dhamija, R., Tygar, D.: Hearst. m. 2006. why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 22–27 (2006)
Google Scholar
Dhanaraj, S., Karthikeyani, V.: A study on e-mail image spam filtering techniques. In: 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 49–55. IEEE (2013)
Google Scholar
Dudley, J.: Improving the performance of heuristic spam detection using a multi-objective genetic algorithm. The University of Western Australia, School of Computer Science and Software Engineering (2007)
Google Scholar
Fahmy, H.M., Ghoneim, S.A.: Phishblock: A hybrid anti-phishing tool. In: 2011 International Conference on Communications, Computing and Control Applications (CCCA), pp. 1–5. IEEE (2011)
Google Scholar
Fan, W., Kevin, L., Rong, R.: Social engineering: IE based model of human weakness for attack and defense investigations. IJ Comput. Netw. Inf. Secur. 9(1), 1–11 (2017)
Google Scholar
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, pp. 649–656 (2007)
Google Scholar
Gangavarapu, T., Jaidhar, C.D., Chanduka, B.: Applicability of machine learning in spam and phishing email filtering: review and approaches. Artif. Intell. Rev. 53(7), 5019–5081 (2020). https://doi.org/10.1007/s10462-020-09814-9
Article Google Scholar
Guo, K.H., Yuan, Y., Archer, N.P., Connelly, C.E.: Understanding nonmalicious security violations in the workplace: a composite behavior model. J. Manag. Inf. Syst. 28(2), 203–236 (2011)
Article Google Scholar
Gupta, B.B., Arachchilage, N.A.G., Psannis, K.E.: Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommun. Syst. 67(2), 247–267 (2017). https://doi.org/10.1007/s11235-017-0334-z
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Google Scholar
Hao, S., Syed, N.A., Feamster, N., Gray, A.G., Krasser, S.: Detecting spammers with snare: spatio-temporal network-level automatic reputation engine. In: USENIX security symposium, vol. 9 (2009)
Google Scholar
Harinahalli Lokesh, G., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)
Article Google Scholar
Hayati, P., Potdar, V., Talevski, A., Firoozeh, N., Sarencheh, S., Yeganeh, E.: Definition of spam 2.0: new spamming boom, pp. 580–584 (2010). https://doi.org/10.1109/DEST.2010.5610590
Heron, S.: Technologies for spam detection. Netw. Secur. 2009(1), 11–15 (2009)
Article Google Scholar
Hill, J.: (2021). https://abnormalsecurity.com/blog/how-to-stop-email-spoofing
Irwin, L.: (2020). https://www.itgovernance.eu/blog/en/the-5-most-common-types-of-phishing-attack
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Chapter Google Scholar
Jensen, M.L., Dinger, M., Wright, R.T., Thatcher, J.B.: Training to mitigate phishing attacks using mindfulness techniques. J. Manag. Inf. Syst. 34(2), 597–626 (2017)
Article Google Scholar
Khan, S.A., Khan, W., Hussain, A.: Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 301–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_26
Chapter Google Scholar
Kołcz, A., Chowdhury, A.: Hardening fingerprinting by context. In: CEAS 2007 (2007)
Google Scholar
Kołcz, A., Chowdhury, A., Alspector, J.: The impact of feature selection on signature-driven spam detection. In: Proceedings of the 1st Conference on Email and Anti-Spam (CEAS-2004) (2004)
Google Scholar
Kumar, N., Sonowal, S., et al.: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113. IEEE (2020)
Google Scholar
Lam, H.Y., Yeung, D.Y.: A learning approach to spam detection based on social networks. Ph.D. thesis, Hong Kong University of Science and Technology (2007)
Google Scholar
Liu, X., et al.: CPSFS: a credible personalized spam filtering scheme by crowdsourcing. Wireless Communications and Mobile Computing 2017 (2017)
Google Scholar
Luo, C., Xia, C., Shao, H.: Training high quality spam-detection models using weak labels (2020)
Google Scholar
Mansoor, R., Jayasinghe, N.D., Muslam, M.M.A.: A comprehensive review on email spam classification using machine learning algorithms. In: 2021 International Conference on Information Networking (ICOIN), pp. 327–332. IEEE (2021)
Google Scholar
Mohamed, S.A.E.: Efficient spam filtering system based on smart cooperative subjective and objective methods (2013)
Google Scholar
Mozilla Support: Thunderbird and junk/spam messages (2021). https://support.mozilla.org/en-US/kb/thunderbird-and-junk-spam-messages. Accessed on 19 Jan 2021
Najadat, H., Hmeidi, I.: Web spam detection using machine learningin specific domain features (2008)
Google Scholar
Niakanlahiji, A., Chu, B.T., Al-Shaer, E.: Phishmon: a machine learning framework for detecting phishing webpages. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 220–225. IEEE (2018)
Google Scholar
Nicho, M., Fakhry, H., Egbue, U.: Evaluating user vulnerabilities vs phisher skills in spear phishing. Int. J. Comput. Sci. Inform. Syst. 13, 93–108 (2018)
Google Scholar
Paans, R., Herschberg, I.: Computer security: the long road ahead. Comput. Secur. 6(5), 403–416 (1987)
Article Google Scholar
Patidar, V., Singh, D., Singh, A.: A novel technique of email classification for spam detection. Int. J. Appl. Inf. Syst. 5(10), 15–19 (2013)
Google Scholar
Patil, R.C., Patil, D.: Web spam detection using SVM classifier. In: 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2015)
Google Scholar
Rahman, S.S.M.M., Islam, T., Jabiullah, M.I.: Phishstack: evaluation of stacked generalization in phishing URLs detection. Proc. Comput. Sci. 167, 2410–2418 (2020)
Article Google Scholar
Ramachandran, A., Dagon, D., Feamster, N.: Can DNS-based blacklists keep up with bots? In: CEAS (2006)
Google Scholar
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning, pp. 4334–4343. PMLR (2018)
Google Scholar
Richardson, M.D., Lemoine, P.A., Stephens, W.E., Waller, R.E.: Planning for cyber security in schools: the human factor. Educat. Plann. 27(2), 23–39 (2020)
Google Scholar
Roman, R., Zhou, J., Lopez, J.: An anti-spam scheme using pre-challenges. Comput. Commun. 29(15), 2739–2749 (2006)
Article Google Scholar
Rosenthal, M.: (2021). https://www.tessian.com/blog/phishing-statistics-2020/
Seewald, A.K.: Combining Bayesian and rule score learning: automated tuning for spam as sassin. Intelligent Data Analysis. Technical report, TR-2004-11 Austrian Research Institute for Artificial Intelligence, Vienna, Austria (2004)
Google Scholar
Sendpulse: (2020). https://sendpulse.com/support/glossary/phishing
Shakela, V., Jazri, H.: Assessment of spear phishing user experience and awareness: an evaluation framework model of spear phishing exposure level (SPEL) in the Namibian financial industry. In: 2019 international conference on advances in big data, computing and data communication systems (icABCD), pp. 1–5. IEEE (2019)
Google Scholar
Shen, D., et al.: Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms (2018). https://doi.org/10.18653/v1/P18-1041
Sonowal, G., Kuppusamy, K.: Mmsphid: a phoneme based phishing verification model for persons with visual impairments. Inf. Comput Secur. 26(5), 613–636 (2018)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Statista (2021). https://www.statista.com/statistics/255080/number-of-e-mail-users-worldwide
Tessian (2021). https://www.tessian.com/blog/covid-19-real-life-examples-of-opportunistic-phishing-emails-2/
Tong, Z., Weiss, S.M.: The Handbook of Data Mining. Lawrence Erlbaum Assoociates, New Jersey (2003)
Google Scholar
Wash, R., Cooper, M.M.: Who provides phishing training? facts, stories, and people like me. In: Proceedings of the 2018 Chi Conference on Human Factors in Computing Systems, pp. 1–12 (2018)
Google Scholar
Wu, C.H.: Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl. 36(3), 4321–4330 (2009)
Article Google Scholar
Zafar, H., Randolph, A., Gupta, S., Hollingsworth, C.: Traditional seta no more: investigating the intersection between cybersecurity and cognitive neuroscience. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Technology Innovation, Zayed University, Dubai, UAE
Mathew Nicho
School of Computing, Robert Gordon University, Aberdeen, UK
Farzan Majdani & Christopher D. McDermott

Authors

Mathew Nicho
View author publications
You can also search for this author in PubMed Google Scholar
Farzan Majdani
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. McDermott
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathew Nicho .

Editor information

Editors and Affiliations

Siemens (United States), Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicho, M., Majdani, F., McDermott, C.D. (2022). Replacing Human Input in Spam Email Detection Using Deep Learning. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-05643-7_25
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05642-0
Online ISBN: 978-3-031-05643-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Replacing Human Input in Spam Email Detection Using Deep Learning