Skip to main content

Utilising Machine Learning Against Email Phishing to Detect Malicious Emails

  • 499 Accesses

Part of the Advanced Sciences and Technologies for Security Applications book series (ASTSA)

Abstract

Phishing is an identity theft evasion strategy used in which consumers accept bogus emails from fraudulent accounts that claim to belong to a legal and real company in the effort to steal sensitive information of the client. This act places many users’ privacy at risk, and therefore researchers continue to work on identifying and improving current detection instruments. Classification is one of the machine learning methods that can be used to detect emails received. Different classification algorithms such as Naïve Bayes and Support Vector Machine (SVM) are discussed and compared in the course of this study. In an integration of the monitored and unregulated strategies, a new method has been developed to detect phishing emails. The research also contrasts the collection classes for manual and automatic emails. Series of terms are used to acquire words to differentiate between malicious and non-malicious communications in this research. In predicting the class attribute, the exactness of the different classifiers has been compared. SVM approach has the most reliable classification and misclassification rates of malicious emails than the Naïve Bayes method. To date, 98% precision was achieved, but if a researcher has a big corpus of training data, it can also be increased further. This research aims to investigate whether email phishing during a pandemic has been accelerated and the proposed research highlights that the phishing sensitivity is focused on the protocols utilised in this research. The key purpose is to express a technique or algorithm for the dissection of mailbox information in order to identify it as phishing or to include a genuine email. Machine Learning is a part of Artificial Intelligence (AI), which uses the knowledge mining method to recognise new or current trends (or highlights) of a data set which is then used for characterisation purposes. This study will discuss the advancement and types of phishing attacks. It will examine the Machine Learning techniques and methods which are currently being utilised. The researcher will further analyse a structure on how to avoid phishing as well as recommending methods which can be improved upon for email phishing. Furthermore, the important role of human behaviour is highlighted i.e., working from home during the pandemic.

Keywords

  • Phishing
  • Data mining
  • Clustering
  • Machine learning
  • Support vector machine
  • Naïve Bayes

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-88040-8_3
  • Chapter length: 30 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-88040-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

References

  1. Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pp 60–69

    Google Scholar 

  2. Abu-Nimeh S, Nappa D, Wang X, Nair S (2009) Distributed phishing detection by applying variable selection using Bayesian additive regression trees. In: 2009 IEEE international conference on communications. IEEE, pp 1–5

    Google Scholar 

  3. Adida B, Bond M, Clulow J, Lin A, Murdoch S, Anderson R, Rivest R (2006) Phish and chips. In: International workshop on security protocols. Springer, Berlin, Heidelberg, pp 40–48

    Google Scholar 

  4. Alguliev RM, Aliguliyev RM, Nazirova SA (2011) Classification of textual e-mail spam using data mining techniques. Appl Comput Intell Soft Comput 2011:10

    Google Scholar 

  5. Almomani A, Gupta BB, Atawneh S, Meulenberg A, Almomani E (2013) A survey of phishing email filtering techniques. IEEE Commun Surv Tutor 15(4):2070–2090

    CrossRef  Google Scholar 

  6. AmtrustFinancial (2021) Social engineering scams rise during COVID-19 | AmTrust Financial. https://amtrustfinancial.com/blog/small-business/social-engineering-scams-rise-covid19-pandemic. Accessed 12 Jan 2021

  7. Azad MA, Morla R (2011) Multistage spit detection in transit voip. In: SoftCOM 2011, 19th international conference on software, telecommunications and computer networks. IEEE, pp 1–9

    Google Scholar 

  8. Basnet RB, Sung AH (2010) Classifying phishing emails using confidence-weighted linear classifiers. In: International conference on information security and artificial intelligence (ISAI), pp 108–112

    Google Scholar 

  9. Bergholz A, Chang J. Paass G, Reichartz F, Strobel S (2008) Improved phishing detection using model-based features. In: CEAS

    Google Scholar 

  10. Brewster T (2021) Coronavirus scam alert: watch out for these risky COVID-19 Websites and Emails. [online] Forbes. https://www.forbes.com/sites/thomasbrewster/2020/03/12/coronavirus-scam-alert-watch-out-for-these-risky-covid-19-websites-and-emails/#2f558bca1099. Accessed 4 Jan 2021

  11. Cao Y, Han W, Le Y (2008) Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM workshop on digital identity management, pp 51–60

    Google Scholar 

  12. Gansterer WN, Pölz D (2009) E-mail classification for phishing defense. In: European conference on information retrieval. Springer, Berlin, Heidelberg, pp 449–460

    Google Scholar 

  13. Jameel NGM, George LE (2013) Detection of phishing emails using feed forward neural network. Int J Comput Appl 77(7)

    Google Scholar 

  14. Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51

    Google Scholar 

  15. Kumar RK, Poonkuzhali G, Sudhakar P (2012) Comparative study on email spam classifier using data mining techniques. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1, pp 14–16

    Google Scholar 

  16. Kumaraguru P, Sheng S, Acquisti A, Cranor LF, Hong J (2010) Teaching Johnny not to fall for phish. ACM Trans Internet Technol (TOIT) 10(2):1–31

    CrossRef  Google Scholar 

  17. Ma L, Torney R, Watters P, Brown S (2009) Automatically generating classifier for phishing email prediction. In: 2009 10th international symposium on pervasive systems, algorithms, and networks. IEEE, pp 779–783

    Google Scholar 

  18. Muncaster P (2020) COVID19 fears drive phishing emails up 667% in under a month. [online] Infosecurity Magazine. Available at: [Accessed 9 June 2020]

    Google Scholar 

  19. Nizamani S, Memon N, Glasdam M, Nguyen DD (2014) Detection of fraudulent emails by employing advanced feature abundance. Egypt Inform J 15(3):169–174

    CrossRef  Google Scholar 

  20. Paaß G, Bergholz A (2009) AntiPhish-machine learning for phishing detection. Project Exhibition at ECML/PKDD, 8

    Google Scholar 

  21. Ramanathan V, Wechsler H (2012) phishGILLNET—phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training. EURASIP J Inf Secur 2012(1):1

    CrossRef  Google Scholar 

  22. Thomson Reuters Institute (2021) COVID-19 and financial scams, fraud and misinformation: what you need to know—Thomson Reuters Institute. https://www.thomsonreuters.com/en-us/posts/government/covid-19-scams-frauds/. Accessed 4 Jan 2021

  23. Tidy J (2020) Google blocking 18M coronavirus scam emails a day. BBC News. [online] Available at: [Accessed 7 June 2020].

    Google Scholar 

  24. Toolan F, Carthy J (2009) Phishing detection using classifier ensembles. In: 2009 eCrime researchers summit. IEEE, pp 1–9

    Google Scholar 

  25. Wu Y, Zhao Z, Qiu Y, Bao F (2010) Blocking foxy phishing emails with historical information. In: 2010 IEEE international conference on communications. IEEE, pp 1–5

    Google Scholar 

  26. Zhang W, Lu H, Xu B, Yang H (2013) Web phishing detection based on page spatial layout similarity. Informatica 37(3)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Jahankhani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Parmar, Y.S., Jahankhani, H. (2021). Utilising Machine Learning Against Email Phishing to Detect Malicious Emails. In: Montasari, R., Jahankhani, H. (eds) Artificial Intelligence in Cyber Security: Impact and Implications. Advanced Sciences and Technologies for Security Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-88040-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88040-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88039-2

  • Online ISBN: 978-3-030-88040-8

  • eBook Packages: Computer ScienceComputer Science (R0)