Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites

Mahamudul Hasan, S. M.; Jakilim, Nirjas Mohammad; Forhad Rabbi, Md.; Rahman Pir, Rumel M. S.

doi:10.1007/978-981-19-4831-2_48

S. M. Mahamudul Hasan⁴⁰,
Nirjas Mohammad Jakilim⁴⁰,
Md. Forhad Rabbi⁴⁰ &
…
Rumel M. S. Rahman Pir⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 925))

830 Accesses
1 Citations

Abstract

Consumer tastes have moved away from conventional shopping and toward electronic commerce due to the Internet’s fast growth. Rather than conducting bank or shop robberies, today’s criminals use a range of sophisticated cyber methods to track down their victims. Attackers have developed new ways of deceiving customers, such as phishing, using fake websites to gather sensitive information such as account IDs, usernames, and passwords. The semantic-based nature of the assaults, which mainly leverage the vulnerabilities of computer users, makes establishing the authenticity of a web page more difficult. Machine learning (ML) is a typical data analysis technique that has shown promising results in the battle against phishing. The article examines the applicability of machine learning methods for identifying phishing attempts and their advantages and disadvantages. Specifically, a variety of machine learning methods have been explored to find appropriate anti-Phishing technology solutions. More significantly, we used a wide range of machine learning methods to test real-world phishing datasets and against several criteria. To detect phishing websites, six different machine learning classification methods are employed. The Random Forest classifier had the most outstanding possible accuracy of 97.17% in this research, while the Gradient Boost Classifier had the highest achievable accuracy of 94.75%. The Decision Tree classifier has a provisioning accuracy of 94.69%. In contrast, Logistic Regression has a provisioning accuracy of 92.76%, KNN has a provisioning accuracy of 60.45%, and SVM has 56.04%. We showed that KNN has trouble detecting phishing sites since it hasn’t been updated in terms of accuracy. Decision trees are almost similar to Gradient Boosting in terms of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shaikh AN, Shabut AM, Alamgir Hossain M (2016) A literature review on phishing crime, prevention review and investigation of gaps. In: 2016 10th international conference on software, knowledge, information management & applications (SKIMA). IEEE
Google Scholar
Scheau C, Arsene A, Dinca G (2016) Phishing and e-commerce: an information security management problem. J Def Resources Manage 7(1):12
Google Scholar
Sarjiyus O, Oye ND, Baha BY (2019) Improved online security framework for e-banking services in Nigeria: a real world perspective. J Sci Res Rep 1–14
Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2015) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24
Article MathSciNet Google Scholar
Adebowale MA et al (2019) Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Syst Appl 115:300–313
Google Scholar
Ali A (2016) Social engineering: phishing latest and future techniques. Accessed 10 Mar 2015
Google Scholar
Goel D, Jain AK (2018) Mobile phishing attacks and defence mechanisms: state of art and open research challenges. Comput Secur 73:519–544
Google Scholar
FBI releases the internet crime complaint center 2020 internet crime report, including COVID-19 scam statistics. https://www.fbi.gov/news/pressrel/press-releases/fbi-releases-the-interne-crime-complaint-center-2020-internet-crime-report-including-covid-19-scam-statistics
Jain AK, Gupta BB (2016) A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J Inf Secur 2016(1):1–11
Google Scholar
Dhamija R, Doug Tygar J, Hearst M (2006) Why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in computing systems
Google Scholar
91% of all cyber attacks begin with a phishing email to an unexpected victim. https://www2.deloitte.com/my/en/pages/risk/articles/91-percent-of-all-cyber-attacks-begin-with-a-phishing-email-to-an-unexpected-victim.html
Phishing activity trends reports. https://apwg.org/trendsreports/
Charoen D (2011) Phishing: a field experiment. Int J Comput Sci Secur (IJCSS) 5(2):277
Google Scholar
Jakobsson M, Myers S (eds) Phishing and countermeasures. Understanding the increasing problem of electronic identity theft. Wiley, Hoboken
Google Scholar
Ramzan Z (2010) Phishing attacks and countermeasures. In: Handbook of information and communication security, pp 433–448
Google Scholar
Must-know phishing statistics. https://www.tessian.com/blog/phishing-statistics-2020/
Jain AK, Gupta BB (2021) A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp Inf Syst 1–39
Google Scholar
Passos IC, Mwangi B, Kapczinski F (2016) Big data analytics and machine learning: 2015 and beyond. Lancet Psychiat 3(1):13–15
Article Google Scholar
Whittaker C, Ryner B, Nazif M (2010) Large-scale automatic classification of phishing pages
Google Scholar
Pfleeger SL, Bloom G (2005) Canning spam: proposed solutions to unwanted email. IEEE Secur Priv 3(2):40–47
Article Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web
Google Scholar
Islam R, Abawajy J (2013) A multi-tier phishing detection and filtering approach. J Netw Comput Appl 36(1):324–335
Article Google Scholar
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458
Article Google Scholar
Basit A et al (2020) A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommun Syst 1–16
Google Scholar
Peng T, Harris I, Sawa Y (2018) Detecting phishing attacks using natural language processing and machine learning. In: 2018 IEEE 12th international conference on semantic computing (ICSC). IEEE
Google Scholar
Phishing website detector. https://www.kaggle.com/eswarchandt/phishing-website-detector
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar

Download references

Author information

Authors and Affiliations

Shahjalal University of Science and Technology, Sylhet, Bangladesh
S. M. Mahamudul Hasan, Nirjas Mohammad Jakilim & Md. Forhad Rabbi
Leading University, Sylhet, Bangladesh
Rumel M. S. Rahman Pir

Authors

S. M. Mahamudul Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Nirjas Mohammad Jakilim
View author publications
You can also search for this author in PubMed Google Scholar
Md. Forhad Rabbi
View author publications
You can also search for this author in PubMed Google Scholar
Rumel M. S. Rahman Pir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Mahamudul Hasan .

Editor information

Editors and Affiliations

University of South Florida Sarasota–Manatee, Sarasota, FL, USA
Bhuvan Unhelker
Bournemouth University, Poole, UK
Hari Mohan Pandey
School of Engineering and Technology, Sharda University, Greater Noida, India
Gaurav Raj

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahamudul Hasan, S.M., Jakilim, N.M., Forhad Rabbi, M., Rahman Pir, R.M.S. (2022). Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites. In: Unhelker, B., Pandey, H.M., Raj, G. (eds) Applications of Artificial Intelligence and Machine Learning. Lecture Notes in Electrical Engineering, vol 925. Springer, Singapore. https://doi.org/10.1007/978-981-19-4831-2_48

Download citation

DOI: https://doi.org/10.1007/978-981-19-4831-2_48
Published: 14 September 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4830-5
Online ISBN: 978-981-19-4831-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites