Skip to main content

Analysis of Ensemble Methods for Phishing Detection

  • Chapter
  • First Online:
Intelligent Multimedia Signal Processing for Smart Ecosystems
  • 128 Accesses

Abstract

Phishing is one of the biggest issues in the cyberspace. It leads to monetary losses for both public and private industries. The escalating number of phishing attacks is a major concern for security experts. High accuracy phishing attack detection has always been a difficult problem. The conventional tools used for detection of phishing webpages use signature-based methods. These methods are not able to detect zero-day phishing webpages. Thus, security researchers have started to use machine and deep learning algorithms to detect newly created phishing webpages. This chapter studies and compares various machine learning and ensemble methods for classification and detection of phishing webpages. A comparative analysis of machine learning techniques like Naïve Bayes (NB), logistic regression (LR), k-nearest neighbor (k-NN), decision table (DT), random forest (RF) and ensemble methods such as bagging, boosting, stacking and voting methods is carried out. Experiments are conducted on a phishing dataset with 30 features containing 6157 benign and 4898 phishing webpages. Experimental results reveal that the stacking ensemble method provides the best accuracy of 96.987% as compared to other methods used for detecting phishing webpages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Change history

  • 03 November 2023

    A correction has been published.

References

  1. Peng P, Xu C, Quinn L, Hu H, Viswanath B, Wang G (2019) What happens after you leak your password: Understanding credential sharing on phishing sites. In: Proceedings of the 2019 ACM Asia conference on computer and communications security, pp 181–192

    Google Scholar 

  2. Threat Analysis Group, Findings on COVID-19 and online security threats. Available online at: https://blog.google/technology/safety-security/threat-analysis-group/findings-covid-19-and-online-security-threats/

  3. Phishing Activity Trends Report, APWG (2021) Available online at: https://docs.apwg.org/reports/apwg_trends_report_q2_2021.pdf

  4. Security. Available online at: https://www.securitymagazine.com/articles/96430-mobile-phishing-threats-surged-161-in-2021

  5. Gupta D, Rani R (2019) A study of big data evolution and research challenges. J Inf Sci 45:322–340

    Article  Google Scholar 

  6. Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electr Eng 106729

    Google Scholar 

  7. Gupta D, Rani R (2018) Big Data Framework for Zero-Day Malware Detection. Cybernetics and Systems 49:103–121

    Google Scholar 

  8. Dhalaria M, Gandotra E (2021) A hybrid approach for android malware detection and family classification. Int J Interact Multimed AI:174–188

    Google Scholar 

  9. Gandotra E, Singla S, Bansal D, Sofat S (2018) Clustering morphed malware using opcode sequence pattern matching. Recent Patents on Engineering, 12:30–36

    Google Scholar 

  10. Gandotra E, Bansal D, Sofat S (2017) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 48:29–48

    Article  Google Scholar 

  11. Sharma A, Gandotra E, Bansal D, Gupta D (2019) Malware capability assessment using fuzzy logic. Cybernetics and Systems 50: 323–338

    Google Scholar 

  12. Selenium (2021) Available online at: http://docs.seleniumhq.org/download/

  13. Gandotra E, Bansal D, Sofat S (2016) Tools & techniques for malware analysis and classification. Int J Next-Gener Comput:176–197

    Google Scholar 

  14. Jsoup Java HTML Parser, with best of DOM, CSS, and jquery. Available online at: https://jsoup.org/

  15. GSB. Available online at: http://code.google.com/apis/safebrowsing/

  16. Phishtank. Available online at: http://www.phishtank.com/

  17. Gandotra E, Bansal D, Sofat S (2015) Computational techniques for predicting cyber threats. In: Intelligent computing, communication and devices. Springer, New Delhi, pp 247–253

    Chapter  Google Scholar 

  18. Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 05:56–65

    Google Scholar 

  19. Tan CL, Chiew KL, Wong K (2016) PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27

    Article  Google Scholar 

  20. Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo for phishing detection. Comput Secur 54:16–26

    Article  Google Scholar 

  21. Jain AK, Gupta BB (2018) Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 68:687–700

    Article  Google Scholar 

  22. Gandotra E, Gupta D (2020) Improving spoofed website detection using machine learning. Cybern Syst Int J 52(2):169–190

    Article  Google Scholar 

  23. Gandotra E, Gupta D (2021) An efficient approach for phishing detection using machine learning. In: Giri KJ, Parah SA, Bashir R, Muhammad K (eds) Multimedia security: algorithm development, analysis and applications. Springer, Singapore, pp 239–253

    Chapter  Google Scholar 

  24. Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357

    Article  Google Scholar 

  25. Afzal I, Parah SA, Hurrah NN, Song OY (2020) Secure patient data transmission on resource constrained platform. In: Multimedia tools and applications. Springer, pp 1–26

    Google Scholar 

  26. Hurrah NN, Parah SA, Sheikh JA, Al-Turjman F, Muhammad K (2019) Secure data transmission framework for confidentiality in IoTs. Ad Hoc Netw 101989:101989

    Article  Google Scholar 

  27. Jan A, Parah SA, Malik BA (2022) IEFHAC: image encryption framework based on hessenberg transform and chaotic theory for smart health. Multimed Tools Appl 81:18829–18853

    Article  Google Scholar 

  28. Parah SA, Kaw JA, Bellavista P, Loan NA, Bhat GM, Muhammad K, de Albuquerque VHC (2022) Efficient security and authentication for edge-based internet of medical things. IEEE Internet Things J 8:15652–15662

    Article  Google Scholar 

  29. Sarosh P, Parah SA, Malik BA, Hijji M, Muhammad K (2022) Real-time medical data security solution for smart healthcare. In: IEEE transactions on industrial informatics, pp 1– 11. https://doi.org/10.1109/TII.2022.3217039

  30. Buber E, Dırı B, Sahingoz OK (2017) Detecting phishing attacks from URL by using NLP techniques. In: 2017 international conference on computer science and engineering (UBMK). IEEE, pp 337–342

    Chapter  Google Scholar 

  31. Rao RS, Pais AR (2019) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl 31:3851–3873

    Article  Google Scholar 

  32. Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+ a feature-rich machine learning framework for detecting phishing web sites. In: ACM transactions on information and system security (TISSEC), pp 1–28

    Google Scholar 

  33. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 639–648

    Chapter  Google Scholar 

  34. Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N (2019) Phishing detection based on machine learning and feature selection methods. Int J Interact Mob Technol:171–183

    Google Scholar 

  35. Yerima SY, Alzaylaee MK (2020) High accuracy phishing detection based on convolutional neural networks. arXiv preprint arXiv:2004.03960

    Google Scholar 

  36. Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23:4315–4327

    Article  Google Scholar 

  37. Abutaha M, Ababneh M, Mahmoud K, Baddar SAH (2021) URL phishing detection using machine learning techniques based on URLs lexical analysis. In: 2021 12th international conference on information and communication systems (ICICS), IEEE, pp 147–152

    Google Scholar 

  38. Jain AK, Parashar S, Katare P, Sharma I (2020) Phishskape: a content based approach to escape phishing attacks. Procedia Comput Sci 171:1102–1109

    Article  Google Scholar 

  39. Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166

    Article  Google Scholar 

  40. UCI Machine Learning Repository, Phishing Websites Dataset. Available online at: https://archive.ics.uci.edu/ml/datasets/phishing+websites

  41. Mohammad RM, Thabtah F, Mc Cluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492–497

    Google Scholar 

  42. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec 31:76–77

    Article  Google Scholar 

  43. Leo B (1996) Bagging predictors. Mach Learn:123–140

    Google Scholar 

  44. Quinlan JR (1996) Bagging, boosting, and C4.5. In: AAAI/IAAI, pp 725–730

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gupta, D., Gandotra, E., Mohan, Y., Singh, S. (2023). Analysis of Ensemble Methods for Phishing Detection. In: Parah, S.A., Hurrah, N.N., Khan, E. (eds) Intelligent Multimedia Signal Processing for Smart Ecosystems. Springer, Cham. https://doi.org/10.1007/978-3-031-34873-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34873-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34872-3

  • Online ISBN: 978-3-031-34873-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics