Skip to main content

Replacing Human Input in Spam Email Detection Using Deep Learning

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13336))

Included in the following conference series:

  • 2296 Accesses

Abstract

The Covid-19 pandemic has been a driving force for a substantial increase in online activity and transactions across the globe. As a consequence, cyber-attacks, particularly those leveraging email as the preferred attack vector, have also increased exponentially since Q1 2020. Despite this, email remains a popular communication tool. Previously, in an effort to reduce the amount of spam entering a users inbox, many email providers started to incorporate spam filters into their products. However, many commercial spam filters rely on a human to train the filter, leaving a margin of risk if sufficient training has not occurred. In addition, knowing this, hackers employ more targeted and nuanced obfuscation methods to bypass in-built spam filters. In response to this continued problem, there is a growing body of research on the use of machine learning techniques for spam filtering. In many cases, detection results have shown great promise, but often still rely on human input to classify training datasets. In this study, we explore specifically the use of deep learning as a method of reducing human input required for spam detection. First, we evaluate the efficacy of popular spam detection methods/tools/techniques (freeware). Next, we narrow down machine learning techniques to select the appropriate method for our dataset. This was then compared with the accuracy of freeware spam detection tools to present our results. Our results showed that our deep learning model, based on simple word embedding and global max pooling (SWEM-max) had higher accuracy (98.41%) than both Thunderbird (95%) and Mailwasher (92%) which are based on Bayesian spam filtering. Finally, we postulate whether this improvement is enough to accept the removal of human input in spam email detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.firetrust.com/products/mailwasher-pro.

  2. 2.

    https://support.mozilla.org/en-US/kb/thunderbird-and-junk-spam-messages/.

  3. 3.

    https://www.kaggle.com/.

  4. 4.

    https://www.hmailserver.com/.

References

  1. Abroshan, H., Devos, J., Poels, G., Laermans, E.: Phishing happens beyond technology: the effects of human behaviors and demographics on each step of a phishing process. IEEE Access 9, 44928–44949 (2021)

    Article  Google Scholar 

  2. Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 60–69 (2007)

    Google Scholar 

  3. Abutair, H., Belghith, A., AlAhmadi, S.: CBR-PDS: a case-based reasoning phishing detection system. J. Ambient. Intell. Humaniz. Comput. 10(7), 2593–2606 (2019)

    Article  Google Scholar 

  4. Akdemir, N., Lawless, C.J.: Exploring the human factor in cyber-enabled and cyber-dependent crime victimisation: a lifestyle routine activities approach. Internet Res. (2020)

    Google Scholar 

  5. Alghoul, A., Al Ajrami, S., Al Jarousha, G., Harb, G., Abu-Naser, S.S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (IJAER) 2(11), 8–14 (2018)

    Google Scholar 

  6. Alkahtani, H.S., Gardner-Stephen, P., Goodwin, R.: A taxonomy of email spam filters. In: Proceedings of the 12th International Arab Conference on Information Technology (ACIT 2011), pp. 351–356 (2011)

    Google Scholar 

  7. Alrwais, S., Yuan, K., Alowaisheq, E., Li, Z., Wang, X.: Understanding the dark side of domain parking. In: 23rd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 14), pp. 207–222 (2014)

    Google Scholar 

  8. Alshaikh, M., Naseer, H., Ahmad, A., Maynard, S.B.: Toward sustainable behaviour change: an approach for cyber security education training and awareness (2019)

    Google Scholar 

  9. Baadel, S., Lu, J.: Data analytics: intelligent anti-phishing techniques based on machine learning. J. Inf. Knowl. Manage. 18(01), 1950005 (2019)

    Article  Google Scholar 

  10. Banday, M.T., Jan, T.R.: Effectiveness and limitations of statistical spam filters. arXiv preprint arXiv:0910.2540 (2009)

  11. Bhardwaj, A., Sapra, V., Kumar, A., Kumar, N., Arthi, S.: Why is phishing still successful? Comput. Fraud Secur. 2020(9), 15–19 (2020)

    Article  Google Scholar 

  12. Boyle, P., Shepherd, L.A.: Mailtrout: a machine learning browser extension for detecting phishing emails. In: 33rd British Human Computer Interaction Conference: Post-Pandemic HCI-Living digitally. Association for Computing Machinery (ACM) (2021)

    Google Scholar 

  13. Caldwell, T.: Training-the weakest link. Comput. Fraud Secur. 2012(9), 8–14 (2012)

    Article  Google Scholar 

  14. Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM Workshop on Digital Identity Management, pp. 51–60 (2008)

    Google Scholar 

  15. Chen, C., et al.: A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans. Comput. Social Syst. 2(3), 65–76 (2015)

    Article  Google Scholar 

  16. Christina, V., Karpagavalli, S., Suganya, G.: Email spam filtering using supervised machine learning techniques. Int. J. Comput. Sci. Eng. (IJCSE) 2(09), 3126–3129 (2010)

    Google Scholar 

  17. Cveticanin, N.: (2021). https://dataprot.net/

  18. Dablain, D., Krawczyk, B., Chawla, N.V.: Deepsmote: fusing deep learning and smote for imbalanced data. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2022)

    Google Scholar 

  19. Dada, E.G., Bassi, J.S., Chiroma, H., Adetunmbi, A.O., Ajibuwa, O.E., et al.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802 (2019)

    Article  Google Scholar 

  20. Desolda, G., Ferro, L.S., Marrella, A., Catarci, T., Costabile, M.F.: Human factors in phishing attacks: a systematic literature review. ACM Comput. Surv. (CSUR) 54(8), 1–35 (2021)

    Article  Google Scholar 

  21. Dhamija, R., Tygar, D.: Hearst. m. 2006. why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 22–27 (2006)

    Google Scholar 

  22. Dhanaraj, S., Karthikeyani, V.: A study on e-mail image spam filtering techniques. In: 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 49–55. IEEE (2013)

    Google Scholar 

  23. Dudley, J.: Improving the performance of heuristic spam detection using a multi-objective genetic algorithm. The University of Western Australia, School of Computer Science and Software Engineering (2007)

    Google Scholar 

  24. Fahmy, H.M., Ghoneim, S.A.: Phishblock: A hybrid anti-phishing tool. In: 2011 International Conference on Communications, Computing and Control Applications (CCCA), pp. 1–5. IEEE (2011)

    Google Scholar 

  25. Fan, W., Kevin, L., Rong, R.: Social engineering: IE based model of human weakness for attack and defense investigations. IJ Comput. Netw. Inf. Secur. 9(1), 1–11 (2017)

    Google Scholar 

  26. Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th international conference on World Wide Web, pp. 649–656 (2007)

    Google Scholar 

  27. Gangavarapu, T., Jaidhar, C.D., Chanduka, B.: Applicability of machine learning in spam and phishing email filtering: review and approaches. Artif. Intell. Rev. 53(7), 5019–5081 (2020). https://doi.org/10.1007/s10462-020-09814-9

    Article  Google Scholar 

  28. Guo, K.H., Yuan, Y., Archer, N.P., Connelly, C.E.: Understanding nonmalicious security violations in the workplace: a composite behavior model. J. Manag. Inf. Syst. 28(2), 203–236 (2011)

    Article  Google Scholar 

  29. Gupta, B.B., Arachchilage, N.A.G., Psannis, K.E.: Defending against phishing attacks: taxonomy of methods, current issues and future directions. Telecommun. Syst. 67(2), 247–267 (2017). https://doi.org/10.1007/s11235-017-0334-z

    Article  Google Scholar 

  30. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    Google Scholar 

  31. Hao, S., Syed, N.A., Feamster, N., Gray, A.G., Krasser, S.: Detecting spammers with snare: spatio-temporal network-level automatic reputation engine. In: USENIX security symposium, vol. 9 (2009)

    Google Scholar 

  32. Harinahalli Lokesh, G., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)

    Article  Google Scholar 

  33. Hayati, P., Potdar, V., Talevski, A., Firoozeh, N., Sarencheh, S., Yeganeh, E.: Definition of spam 2.0: new spamming boom, pp. 580–584 (2010). https://doi.org/10.1109/DEST.2010.5610590

  34. Heron, S.: Technologies for spam detection. Netw. Secur. 2009(1), 11–15 (2009)

    Article  Google Scholar 

  35. Hill, J.: (2021). https://abnormalsecurity.com/blog/how-to-stop-email-spoofing

  36. Irwin, L.: (2020). https://www.itgovernance.eu/blog/en/the-5-most-common-types-of-phishing-attack

  37. Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44

    Chapter  Google Scholar 

  38. Jensen, M.L., Dinger, M., Wright, R.T., Thatcher, J.B.: Training to mitigate phishing attacks using mindfulness techniques. J. Manag. Inf. Syst. 34(2), 597–626 (2017)

    Article  Google Scholar 

  39. Khan, S.A., Khan, W., Hussain, A.: Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 301–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_26

    Chapter  Google Scholar 

  40. Kołcz, A., Chowdhury, A.: Hardening fingerprinting by context. In: CEAS 2007 (2007)

    Google Scholar 

  41. Kołcz, A., Chowdhury, A., Alspector, J.: The impact of feature selection on signature-driven spam detection. In: Proceedings of the 1st Conference on Email and Anti-Spam (CEAS-2004) (2004)

    Google Scholar 

  42. Kumar, N., Sonowal, S., et al.: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113. IEEE (2020)

    Google Scholar 

  43. Lam, H.Y., Yeung, D.Y.: A learning approach to spam detection based on social networks. Ph.D. thesis, Hong Kong University of Science and Technology (2007)

    Google Scholar 

  44. Liu, X., et al.: CPSFS: a credible personalized spam filtering scheme by crowdsourcing. Wireless Communications and Mobile Computing 2017 (2017)

    Google Scholar 

  45. Luo, C., Xia, C., Shao, H.: Training high quality spam-detection models using weak labels (2020)

    Google Scholar 

  46. Mansoor, R., Jayasinghe, N.D., Muslam, M.M.A.: A comprehensive review on email spam classification using machine learning algorithms. In: 2021 International Conference on Information Networking (ICOIN), pp. 327–332. IEEE (2021)

    Google Scholar 

  47. Mohamed, S.A.E.: Efficient spam filtering system based on smart cooperative subjective and objective methods (2013)

    Google Scholar 

  48. Mozilla Support: Thunderbird and junk/spam messages (2021). https://support.mozilla.org/en-US/kb/thunderbird-and-junk-spam-messages. Accessed on 19 Jan 2021

  49. Najadat, H., Hmeidi, I.: Web spam detection using machine learningin specific domain features (2008)

    Google Scholar 

  50. Niakanlahiji, A., Chu, B.T., Al-Shaer, E.: Phishmon: a machine learning framework for detecting phishing webpages. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 220–225. IEEE (2018)

    Google Scholar 

  51. Nicho, M., Fakhry, H., Egbue, U.: Evaluating user vulnerabilities vs phisher skills in spear phishing. Int. J. Comput. Sci. Inform. Syst. 13, 93–108 (2018)

    Google Scholar 

  52. Paans, R., Herschberg, I.: Computer security: the long road ahead. Comput. Secur. 6(5), 403–416 (1987)

    Article  Google Scholar 

  53. Patidar, V., Singh, D., Singh, A.: A novel technique of email classification for spam detection. Int. J. Appl. Inf. Syst. 5(10), 15–19 (2013)

    Google Scholar 

  54. Patil, R.C., Patil, D.: Web spam detection using SVM classifier. In: 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2015)

    Google Scholar 

  55. Rahman, S.S.M.M., Islam, T., Jabiullah, M.I.: Phishstack: evaluation of stacked generalization in phishing URLs detection. Proc. Comput. Sci. 167, 2410–2418 (2020)

    Article  Google Scholar 

  56. Ramachandran, A., Dagon, D., Feamster, N.: Can DNS-based blacklists keep up with bots? In: CEAS (2006)

    Google Scholar 

  57. Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning, pp. 4334–4343. PMLR (2018)

    Google Scholar 

  58. Richardson, M.D., Lemoine, P.A., Stephens, W.E., Waller, R.E.: Planning for cyber security in schools: the human factor. Educat. Plann. 27(2), 23–39 (2020)

    Google Scholar 

  59. Roman, R., Zhou, J., Lopez, J.: An anti-spam scheme using pre-challenges. Comput. Commun. 29(15), 2739–2749 (2006)

    Article  Google Scholar 

  60. Rosenthal, M.: (2021). https://www.tessian.com/blog/phishing-statistics-2020/

  61. Seewald, A.K.: Combining Bayesian and rule score learning: automated tuning for spam as sassin. Intelligent Data Analysis. Technical report, TR-2004-11 Austrian Research Institute for Artificial Intelligence, Vienna, Austria (2004)

    Google Scholar 

  62. Sendpulse: (2020). https://sendpulse.com/support/glossary/phishing

  63. Shakela, V., Jazri, H.: Assessment of spear phishing user experience and awareness: an evaluation framework model of spear phishing exposure level (SPEL) in the Namibian financial industry. In: 2019 international conference on advances in big data, computing and data communication systems (icABCD), pp. 1–5. IEEE (2019)

    Google Scholar 

  64. Shen, D., et al.: Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms (2018). https://doi.org/10.18653/v1/P18-1041

  65. Sonowal, G., Kuppusamy, K.: Mmsphid: a phoneme based phishing verification model for persons with visual impairments. Inf. Comput Secur. 26(5), 613–636 (2018)

    Google Scholar 

  66. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  67. Statista (2021). https://www.statista.com/statistics/255080/number-of-e-mail-users-worldwide

  68. Tessian (2021). https://www.tessian.com/blog/covid-19-real-life-examples-of-opportunistic-phishing-emails-2/

  69. Tong, Z., Weiss, S.M.: The Handbook of Data Mining. Lawrence Erlbaum Assoociates, New Jersey (2003)

    Google Scholar 

  70. Wash, R., Cooper, M.M.: Who provides phishing training? facts, stories, and people like me. In: Proceedings of the 2018 Chi Conference on Human Factors in Computing Systems, pp. 1–12 (2018)

    Google Scholar 

  71. Wu, C.H.: Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl. 36(3), 4321–4330 (2009)

    Article  Google Scholar 

  72. Zafar, H., Randolph, A., Gupta, S., Hollingsworth, C.: Traditional seta no more: investigating the intersection between cybersecurity and cognitive neuroscience. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathew Nicho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nicho, M., Majdani, F., McDermott, C.D. (2022). Replacing Human Input in Spam Email Detection Using Deep Learning. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05643-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05642-0

  • Online ISBN: 978-3-031-05643-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics