Abstract
Phishing is one of the biggest issues in the cyberspace. It leads to monetary losses for both public and private industries. The escalating number of phishing attacks is a major concern for security experts. High accuracy phishing attack detection has always been a difficult problem. The conventional tools used for detection of phishing webpages use signature-based methods. These methods are not able to detect zero-day phishing webpages. Thus, security researchers have started to use machine and deep learning algorithms to detect newly created phishing webpages. This chapter studies and compares various machine learning and ensemble methods for classification and detection of phishing webpages. A comparative analysis of machine learning techniques like Naïve Bayes (NB), logistic regression (LR), k-nearest neighbor (k-NN), decision table (DT), random forest (RF) and ensemble methods such as bagging, boosting, stacking and voting methods is carried out. Experiments are conducted on a phishing dataset with 30 features containing 6157 benign and 4898 phishing webpages. Experimental results reveal that the stacking ensemble method provides the best accuracy of 96.987% as compared to other methods used for detecting phishing webpages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Change history
03 November 2023
A correction has been published.
References
Peng P, Xu C, Quinn L, Hu H, Viswanath B, Wang G (2019) What happens after you leak your password: Understanding credential sharing on phishing sites. In: Proceedings of the 2019 ACM Asia conference on computer and communications security, pp 181–192
Threat Analysis Group, Findings on COVID-19 and online security threats. Available online at: https://blog.google/technology/safety-security/threat-analysis-group/findings-covid-19-and-online-security-threats/
Phishing Activity Trends Report, APWG (2021) Available online at: https://docs.apwg.org/reports/apwg_trends_report_q2_2021.pdf
Security. Available online at: https://www.securitymagazine.com/articles/96430-mobile-phishing-threats-surged-161-in-2021
Gupta D, Rani R (2019) A study of big data evolution and research challenges. J Inf Sci 45:322–340
Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electr Eng 106729
Gupta D, Rani R (2018) Big Data Framework for Zero-Day Malware Detection. Cybernetics and Systems 49:103–121
Dhalaria M, Gandotra E (2021) A hybrid approach for android malware detection and family classification. Int J Interact Multimed AI:174–188
Gandotra E, Singla S, Bansal D, Sofat S (2018) Clustering morphed malware using opcode sequence pattern matching. Recent Patents on Engineering, 12:30–36
Gandotra E, Bansal D, Sofat S (2017) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 48:29–48
Sharma A, Gandotra E, Bansal D, Gupta D (2019) Malware capability assessment using fuzzy logic. Cybernetics and Systems 50: 323–338
Selenium (2021) Available online at: http://docs.seleniumhq.org/download/
Gandotra E, Bansal D, Sofat S (2016) Tools & techniques for malware analysis and classification. Int J Next-Gener Comput:176–197
Jsoup Java HTML Parser, with best of DOM, CSS, and jquery. Available online at: https://jsoup.org/
GSB. Available online at: http://code.google.com/apis/safebrowsing/
Phishtank. Available online at: http://www.phishtank.com/
Gandotra E, Bansal D, Sofat S (2015) Computational techniques for predicting cyber threats. In: Intelligent computing, communication and devices. Springer, New Delhi, pp 247–253
Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 05:56–65
Tan CL, Chiew KL, Wong K (2016) PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27
Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26
Jain AK, Gupta BB (2018) Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68:687–700
Gandotra E, Gupta D (2020) Improving spoofed website detection using machine learning. Cybern Syst Int J 52(2):169–190
Gandotra E, Gupta D (2021) An efficient approach for phishing detection using machine learning. In: Giri KJ, Parah SA, Bashir R, Muhammad K (eds) Multimedia security: algorithm development, analysis and applications. Springer, Singapore, pp 239–253
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357
Afzal I, Parah SA, Hurrah NN, Song OY (2020) Secure patient data transmission on resource constrained platform. In: Multimedia tools and applications. Springer, pp 1–26
Hurrah NN, Parah SA, Sheikh JA, Al-Turjman F, Muhammad K (2019) Secure data transmission framework for confidentiality in IoTs. Ad Hoc Netw 101989:101989
Jan A, Parah SA, Malik BA (2022) IEFHAC: image encryption framework based on hessenberg transform and chaotic theory for smart health. Multimed Tools Appl 81:18829–18853
Parah SA, Kaw JA, Bellavista P, Loan NA, Bhat GM, Muhammad K, de Albuquerque VHC (2022) Efficient security and authentication for edge-based internet of medical things. IEEE Internet Things J 8:15652–15662
Sarosh P, Parah SA, Malik BA, Hijji M, Muhammad K (2022) Real-time medical data security solution for smart healthcare. In: IEEE transactions on industrial informatics, pp 1– 11. https://doi.org/10.1109/TII.2022.3217039
Buber E, Dırı B, Sahingoz OK (2017) Detecting phishing attacks from URL by using NLP techniques. In: 2017 international conference on computer science and engineering (UBMK). IEEE, pp 337–342
Rao RS, Pais AR (2019) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 31:3851–3873
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+ a feature-rich machine learning framework for detecting phishing web sites. In: ACM transactions on information and system security (TISSEC), pp 1–28
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 639–648
Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N (2019) Phishing detection based on machine learning and feature selection methods. Int J Interact Mob Technol:171–183
Yerima SY, Alzaylaee MK (2020) High accuracy phishing detection based on convolutional neural networks. arXiv preprint arXiv:2004.03960
Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23:4315–4327
Abutaha M, Ababneh M, Mahmoud K, Baddar SAH (2021) URL phishing detection using machine learning techniques based on URLs lexical analysis. In: 2021 12th international conference on information and communication systems (ICICS), IEEE, pp 147–152
Jain AK, Parashar S, Katare P, Sharma I (2020) Phishskape: a content based approach to escape phishing attacks. Procedia Comput Sci 171:1102–1109
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
UCI Machine Learning Repository, Phishing Websites Dataset. Available online at: https://archive.ics.uci.edu/ml/datasets/phishing+websites
Mohammad RM, Thabtah F, Mc Cluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492–497
Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec 31:76–77
Leo B (1996) Bagging predictors. Mach Learn:123–140
Quinlan JR (1996) Bagging, boosting, and C4.5. In: AAAI/IAAI, pp 725–730
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gupta, D., Gandotra, E., Mohan, Y., Singh, S. (2023). Analysis of Ensemble Methods for Phishing Detection. In: Parah, S.A., Hurrah, N.N., Khan, E. (eds) Intelligent Multimedia Signal Processing for Smart Ecosystems. Springer, Cham. https://doi.org/10.1007/978-3-031-34873-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-34873-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34872-3
Online ISBN: 978-3-031-34873-0
eBook Packages: Computer ScienceComputer Science (R0)