Analysis of Ensemble Methods for Phishing Detection

Gupta, Deepak; Gandotra, Ekta; Mohan, Yogesh; Singh, Sukhvir

doi:10.1007/978-3-031-34873-0_4

Deepak Gupta⁴,
Ekta Gandotra⁴,
Yogesh Mohan⁵ &
…
Sukhvir Singh⁶

128 Accesses

The original version of the chapter has been revised. A correction to this chapter can be found at https://doi.org/10.1007/978-3-031-34873-0_16

Abstract

Phishing is one of the biggest issues in the cyberspace. It leads to monetary losses for both public and private industries. The escalating number of phishing attacks is a major concern for security experts. High accuracy phishing attack detection has always been a difficult problem. The conventional tools used for detection of phishing webpages use signature-based methods. These methods are not able to detect zero-day phishing webpages. Thus, security researchers have started to use machine and deep learning algorithms to detect newly created phishing webpages. This chapter studies and compares various machine learning and ensemble methods for classification and detection of phishing webpages. A comparative analysis of machine learning techniques like Naïve Bayes (NB), logistic regression (LR), k-nearest neighbor (k-NN), decision table (DT), random forest (RF) and ensemble methods such as bagging, boosting, stacking and voting methods is carried out. Experiments are conducted on a phishing dataset with 30 features containing 6157 benign and 4898 phishing webpages. Experimental results reveal that the stacking ensemble method provides the best accuracy of 96.987% as compared to other methods used for detecting phishing webpages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stop-Phish: an intelligent phishing detection method using feature selection ensemble

Article 30 October 2021

Phishing URL Identification Using Machine Learning, Ensemble Learning and Deep Learning Techniques

Prediction of Phishing Websites Using Stacked Ensemble Method and Hybrid Features Selection Method

Article 25 September 2022

Change history

03 November 2023
A correction has been published.

References

Peng P, Xu C, Quinn L, Hu H, Viswanath B, Wang G (2019) What happens after you leak your password: Understanding credential sharing on phishing sites. In: Proceedings of the 2019 ACM Asia conference on computer and communications security, pp 181–192
Google Scholar
Threat Analysis Group, Findings on COVID-19 and online security threats. Available online at: https://blog.google/technology/safety-security/threat-analysis-group/findings-covid-19-and-online-security-threats/
Phishing Activity Trends Report, APWG (2021) Available online at: https://docs.apwg.org/reports/apwg_trends_report_q2_2021.pdf
Security. Available online at: https://www.securitymagazine.com/articles/96430-mobile-phishing-threats-surged-161-in-2021
Gupta D, Rani R (2019) A study of big data evolution and research challenges. J Inf Sci 45:322–340
Article Google Scholar
Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electr Eng 106729
Google Scholar
Gupta D, Rani R (2018) Big Data Framework for Zero-Day Malware Detection. Cybernetics and Systems 49:103–121
Google Scholar
Dhalaria M, Gandotra E (2021) A hybrid approach for android malware detection and family classification. Int J Interact Multimed AI:174–188
Google Scholar
Gandotra E, Singla S, Bansal D, Sofat S (2018) Clustering morphed malware using opcode sequence pattern matching. Recent Patents on Engineering, 12:30–36
Google Scholar
Gandotra E, Bansal D, Sofat S (2017) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 48:29–48
Article Google Scholar
Sharma A, Gandotra E, Bansal D, Gupta D (2019) Malware capability assessment using fuzzy logic. Cybernetics and Systems 50: 323–338
Google Scholar
Selenium (2021) Available online at: http://docs.seleniumhq.org/download/
Gandotra E, Bansal D, Sofat S (2016) Tools & techniques for malware analysis and classification. Int J Next-Gener Comput:176–197
Google Scholar
Jsoup Java HTML Parser, with best of DOM, CSS, and jquery. Available online at: https://jsoup.org/
GSB. Available online at: http://code.google.com/apis/safebrowsing/
Phishtank. Available online at: http://www.phishtank.com/
Gandotra E, Bansal D, Sofat S (2015) Computational techniques for predicting cyber threats. In: Intelligent computing, communication and devices. Springer, New Delhi, pp 247–253
Chapter Google Scholar
Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 05:56–65
Google Scholar
Tan CL, Chiew KL, Wong K (2016) PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27
Article Google Scholar
Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26
Article Google Scholar
Jain AK, Gupta BB (2018) Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68:687–700
Article Google Scholar
Gandotra E, Gupta D (2020) Improving spoofed website detection using machine learning. Cybern Syst Int J 52(2):169–190
Article Google Scholar
Gandotra E, Gupta D (2021) An efficient approach for phishing detection using machine learning. In: Giri KJ, Parah SA, Bashir R, Muhammad K (eds) Multimedia security: algorithm development, analysis and applications. Springer, Singapore, pp 239–253
Chapter Google Scholar
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357
Article Google Scholar
Afzal I, Parah SA, Hurrah NN, Song OY (2020) Secure patient data transmission on resource constrained platform. In: Multimedia tools and applications. Springer, pp 1–26
Google Scholar
Hurrah NN, Parah SA, Sheikh JA, Al-Turjman F, Muhammad K (2019) Secure data transmission framework for confidentiality in IoTs. Ad Hoc Netw 101989:101989
Article Google Scholar
Jan A, Parah SA, Malik BA (2022) IEFHAC: image encryption framework based on hessenberg transform and chaotic theory for smart health. Multimed Tools Appl 81:18829–18853
Article Google Scholar
Parah SA, Kaw JA, Bellavista P, Loan NA, Bhat GM, Muhammad K, de Albuquerque VHC (2022) Efficient security and authentication for edge-based internet of medical things. IEEE Internet Things J 8:15652–15662
Article Google Scholar
Sarosh P, Parah SA, Malik BA, Hijji M, Muhammad K (2022) Real-time medical data security solution for smart healthcare. In: IEEE transactions on industrial informatics, pp 1– 11. https://doi.org/10.1109/TII.2022.3217039
Buber E, Dırı B, Sahingoz OK (2017) Detecting phishing attacks from URL by using NLP techniques. In: 2017 international conference on computer science and engineering (UBMK). IEEE, pp 337–342
Chapter Google Scholar
Rao RS, Pais AR (2019) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 31:3851–3873
Article Google Scholar
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+ a feature-rich machine learning framework for detecting phishing web sites. In: ACM transactions on information and system security (TISSEC), pp 1–28
Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 639–648
Chapter Google Scholar
Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N (2019) Phishing detection based on machine learning and feature selection methods. Int J Interact Mob Technol:171–183
Google Scholar
Yerima SY, Alzaylaee MK (2020) High accuracy phishing detection based on convolutional neural networks. arXiv preprint arXiv:2004.03960
Google Scholar
Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23:4315–4327
Article Google Scholar
Abutaha M, Ababneh M, Mahmoud K, Baddar SAH (2021) URL phishing detection using machine learning techniques based on URLs lexical analysis. In: 2021 12th international conference on information and communication systems (ICICS), IEEE, pp 147–152
Google Scholar
Jain AK, Parashar S, Katare P, Sharma I (2020) Phishskape: a content based approach to escape phishing attacks. Procedia Comput Sci 171:1102–1109
Article Google Scholar
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
Article Google Scholar
UCI Machine Learning Repository, Phishing Websites Dataset. Available online at: https://archive.ics.uci.edu/ml/datasets/phishing+websites
Mohammad RM, Thabtah F, Mc Cluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492–497
Google Scholar
Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec 31:76–77
Article Google Scholar
Leo B (1996) Bagging predictors. Mach Learn:123–140
Google Scholar
Quinlan JR (1996) Bagging, boosting, and C4.5. In: AAAI/IAAI, pp 725–730
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering and Information Technology, Jaypee University of Information Technology Waknaghat, Solan, Himachal Pradesh, India
Deepak Gupta & Ekta Gandotra
Department of Computer Science and Applications, Himachal Pradesh University, Shimla, Himachal Pradesh, India
Yogesh Mohan
Department of Computer Science, Himachal Pradesh University Regional Centre, Dharamshala, Himachal Pradesh, India
Sukhvir Singh

Authors

Deepak Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ekta Gandotra
View author publications
You can also search for this author in PubMed Google Scholar
Yogesh Mohan
View author publications
You can also search for this author in PubMed Google Scholar
Sukhvir Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Instrumentation, University of Kashmir, Srinagar, Jammu and Kashmir, India
Shabir A. Parah
Aligarh Muslim University, Aligarh, India
Nasir N. Hurrah
Department of Electronics Engineering, Aligarh Muslim University, AMU Aligarh, India
Ekram Khan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gupta, D., Gandotra, E., Mohan, Y., Singh, S. (2023). Analysis of Ensemble Methods for Phishing Detection. In: Parah, S.A., Hurrah, N.N., Khan, E. (eds) Intelligent Multimedia Signal Processing for Smart Ecosystems. Springer, Cham. https://doi.org/10.1007/978-3-031-34873-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-34873-0_4
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34872-3
Online ISBN: 978-3-031-34873-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Analysis of Ensemble Methods for Phishing Detection

Abstract

Access this chapter

Similar content being viewed by others

Stop-Phish: an intelligent phishing detection method using feature selection ensemble

Phishing URL Identification Using Machine Learning, Ensemble Learning and Deep Learning Techniques

Prediction of Phishing Websites Using Stacked Ensemble Method and Hybrid Features Selection Method

Change history

03 November 2023

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Analysis of Ensemble Methods for Phishing Detection

Abstract

Access this chapter

Similar content being viewed by others

Stop-Phish: an intelligent phishing detection method using feature selection ensemble

Phishing URL Identification Using Machine Learning, Ensemble Learning and Deep Learning Techniques

Prediction of Phishing Websites Using Stacked Ensemble Method and Hybrid Features Selection Method

Change history

03 November 2023

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation