Skip to main content

Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites

  • Conference paper
  • First Online:
Applications of Artificial Intelligence and Machine Learning

Abstract

Consumer tastes have moved away from conventional shopping and toward electronic commerce due to the Internet’s fast growth. Rather than conducting bank or shop robberies, today’s criminals use a range of sophisticated cyber methods to track down their victims. Attackers have developed new ways of deceiving customers, such as phishing, using fake websites to gather sensitive information such as account IDs, usernames, and passwords. The semantic-based nature of the assaults, which mainly leverage the vulnerabilities of computer users, makes establishing the authenticity of a web page more difficult. Machine learning (ML) is a typical data analysis technique that has shown promising results in the battle against phishing. The article examines the applicability of machine learning methods for identifying phishing attempts and their advantages and disadvantages. Specifically, a variety of machine learning methods have been explored to find appropriate anti-Phishing technology solutions. More significantly, we used a wide range of machine learning methods to test real-world phishing datasets and against several criteria. To detect phishing websites, six different machine learning classification methods are employed. The Random Forest classifier had the most outstanding possible accuracy of 97.17% in this research, while the Gradient Boost Classifier had the highest achievable accuracy of 94.75%. The Decision Tree classifier has a provisioning accuracy of 94.69%. In contrast, Logistic Regression has a provisioning accuracy of 92.76%, KNN has a provisioning accuracy of 60.45%, and SVM has 56.04%. We showed that KNN has trouble detecting phishing sites since it hasn’t been updated in terms of accuracy. Decision trees are almost similar to Gradient Boosting in terms of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shaikh AN, Shabut AM, Alamgir Hossain M (2016) A literature review on phishing crime, prevention review and investigation of gaps. In: 2016 10th international conference on software, knowledge, information management & applications (SKIMA). IEEE

    Google Scholar 

  2. Scheau C, Arsene A, Dinca G (2016) Phishing and e-commerce: an information security management problem. J Def Resources Manage 7(1):12

    Google Scholar 

  3. Sarjiyus O, Oye ND, Baha BY (2019) Improved online security framework for e-banking services in Nigeria: a real world perspective. J Sci Res Rep 1–14

    Google Scholar 

  4. Mohammad RM, Thabtah F, McCluskey L (2015) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24

    Article  MathSciNet  Google Scholar 

  5. Adebowale MA et al (2019) Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Syst Appl 115:300–313

    Google Scholar 

  6. Ali A (2016) Social engineering: phishing latest and future techniques. Accessed 10 Mar 2015

    Google Scholar 

  7. Goel D, Jain AK (2018) Mobile phishing attacks and defence mechanisms: state of art and open research challenges. Comput Secur 73:519–544

    Google Scholar 

  8. FBI releases the internet crime complaint center 2020 internet crime report, including COVID-19 scam statistics. https://www.fbi.gov/news/pressrel/press-releases/fbi-releases-the-interne-crime-complaint-center-2020-internet-crime-report-including-covid-19-scam-statistics

  9. Jain AK, Gupta BB (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J Inf Secur 2016(1):1–11

    Google Scholar 

  10. Dhamija R, Doug Tygar J, Hearst M (2006) Why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in computing systems

    Google Scholar 

  11. 91% of all cyber attacks begin with a phishing email to an unexpected victim. https://www2.deloitte.com/my/en/pages/risk/articles/91-percent-of-all-cyber-attacks-begin-with-a-phishing-email-to-an-unexpected-victim.html

  12. Phishing activity trends reports. https://apwg.org/trendsreports/

  13. Charoen D (2011) Phishing: a field experiment. Int J Comput Sci Secur (IJCSS) 5(2):277

    Google Scholar 

  14. Jakobsson M, Myers S (eds) Phishing and countermeasures. Understanding the increasing problem of electronic identity theft. Wiley, Hoboken

    Google Scholar 

  15. Ramzan Z (2010) Phishing attacks and countermeasures. In: Handbook of information and communication security, pp 433–448

    Google Scholar 

  16. Must-know phishing statistics. https://www.tessian.com/blog/phishing-statistics-2020/

  17. Jain AK, Gupta BB (2021) A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp Inf Syst 1–39

    Google Scholar 

  18. Passos IC, Mwangi B, Kapczinski F (2016) Big data analytics and machine learning: 2015 and beyond. Lancet Psychiat 3(1):13–15

    Article  Google Scholar 

  19. Whittaker C, Ryner B, Nazif M (2010) Large-scale automatic classification of phishing pages

    Google Scholar 

  20. Pfleeger SL, Bloom G (2005) Canning spam: proposed solutions to unwanted email. IEEE Secur Priv 3(2):40–47

    Article  Google Scholar 

  21. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web

    Google Scholar 

  22. Islam R, Abawajy J (2013) A multi-tier phishing detection and filtering approach. J Netw Comput Appl 36(1):324–335

    Article  Google Scholar 

  23. Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458

    Article  Google Scholar 

  24. Basit A et al (2020) A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommun Syst 1–16

    Google Scholar 

  25. Peng T, Harris I, Sawa Y (2018) Detecting phishing attacks using natural language processing and machine learning. In: 2018 IEEE 12th international conference on semantic computing (ICSC). IEEE

    Google Scholar 

  26. Phishing website detector. https://www.kaggle.com/eswarchandt/phishing-website-detector

  27. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. M. Mahamudul Hasan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahamudul Hasan, S.M., Jakilim, N.M., Forhad Rabbi, M., Rahman Pir, R.M.S. (2022). Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites. In: Unhelker, B., Pandey, H.M., Raj, G. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 925. Springer, Singapore. https://doi.org/10.1007/978-981-19-4831-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-4831-2_48

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-4830-5

  • Online ISBN: 978-981-19-4831-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics