Skip to main content

An Efficient Approach for Phishing Detection using Machine Learning

  • Chapter
  • First Online:
Multimedia Security

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

The increasing number of phishing attacks is one of the major concerns of security researchers today. The traditional tools for identifying phishing websites use signature-based approaches which are not able to detect newly created phishing webpages. Thus, researchers are coming up with machine learning-based methods which are capable to detect and classify the phishing webpages with high accuracy if a large and variety of features are considered. However, building a classification model using a large number of features takes time which hampers the timely detection of phishing websites. Therefore, it is pertinent to shortlist a set of features using a feature selection method so that high-performance classification models can be developed in less time. In this chapter, we study the role of feature selection methods in detecting phishing webpages efficiently and effectively. A comparative analysis of machine learning algorithms is carried out on the basis of their performance without and with feature selection. Experiments are conducted on a phishing dataset with 30 features containing 4898 phishing and 6157 benign webpages. Several machine learning algorithms are used for obtaining the best results. Afterward, a feature selection method is employed to improve the efficiency of the models. The best accuracy is obtained by random forest both before and after feature selection with a significant improvement in model building time. The experiments demonstrate that employing a feature selection method along with machine learning algorithms can improve the build time of classification models for phishing detection without compromising their accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anti-Phishing Working Group (APWG) https://docs.apwg.org//reports/apwg_trends_report_q4_2018.pdf

  2. IC3 Annual Report 2018 https://pdf.ic3.gov/2018_IC3Report.pdf

  3. Razorthorn phishing report https://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-S

  4. Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Security 56–65

    Google Scholar 

  5. Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electr Eng 106729

    Google Scholar 

  6. Microsoft Security Intelligence Report (2019) vol 24 https://www.microsoft.com/security

  7. Logic Bomb Set Off South Korea Cyberattack. https://www.wired.com/2013/03/logic-bomb-south-korea-attack/

  8. Los Angeles Times https://www.latimes.com/business/la-fi-mh-anthem-is-warning-consumers-20150306-column.html

  9. Threat Analysis Group, Findings on COVID-19 and online security threats https://blog.google/technology/safety-security/threat-analysis-group/findings-covid-19-and-online-security-threats/

  10. Selenium https://docs.seleniumhq.org/download/

  11. Gandotra E, Bansal D, Sofat S (2016) Tools and techniques for malware analysis and classification. Int J Next-Gener Comput

    Google Scholar 

  12. Jsoup Java HTML Parser, with best of DOM, CSS, and jquery https://jsoup.org/

  13. OpenDNS, PhishTank https://wwwphishtank.com/

  14. Google Safe Browsing API—Google Code https://code.google.com/apis/safebrowsing/

  15. Seifert C, Welch I, Komisarczuk P (2008) Identification of malicious web pages with static heuristics. In: 2008 Australasian Telecommunication Networks and Applications Conference, IEEE, pp 91–96

    Google Scholar 

  16. Jain AK, Gupta BB (2017) Phishing detection: analysis of visual similarity based approaches. Secur Commun Network

    Google Scholar 

  17. Gandotra E, Bansal D, Sofat S (2015) Computational techniques for predicting cyber threats. In: Intelligent computing, communication and devices, pp 247–253, Springer, New Delhi

    Google Scholar 

  18. Tan CL, Chiew KL, Wong K (2016) PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, pp 18–27

    Google Scholar 

  19. Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo for phishing detection. Comput Security 16–26

    Google Scholar 

  20. Jain AK, Gupta BB (2018) Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 687–700

    Google Scholar 

  21. Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM workshop on cyber-physical system security, ACM, pp 33–42

    Google Scholar 

  22. Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 345–357

    Google Scholar 

  23. Gu X, Wang H, Ni T (2013) An efficient approach to detecting phishing web. J Comput Inf Syst 5553–5560

    Google Scholar 

  24. Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert systems with applications, pp 231–242

    Google Scholar 

  25. Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), pp 1–28

    Google Scholar 

  26. Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, (2007) pp 639–648

    Google Scholar 

  27. Joshi A, Pattanshetti P, Tanuja R (2019) Phishing Attack Detection using Feature Selection Techniques. In: Nutan College of Engineering & Research, International Conference on Communication and Information Processing (ICCIP)

    Google Scholar 

  28. Wu CY, Kuo CC, Yang CS (2019) A phishing detection system based on machine learning. In: 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA), pp 28–32

    Google Scholar 

  29. Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. The Electronic Library

    Google Scholar 

  30. Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N (2019) Phishing detection based on machine learning and feature selection methods. Int J Interactive Mobile Technol (iJIM) 171–183

    Google Scholar 

  31. Yerima SY, Alzaylaee MK (2020) High accuracy phishing detection based on convolutional neural networks. arXiv preprint arXiv:2004.03960

  32. Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing URLs: a machine learning approach. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology, IEEE, pp 220–223

    Google Scholar 

  33. Hurrah NN, Parah SA, Loan NA, Sheikh JA, Elhoseny M, Muhammad K (2019) Dual watermarking framework for privacy protection and content authentication of multimedia. Future Gener Comput Syst 654–673

    Google Scholar 

  34. Parah SA, Sheikh JA, Bhat GM (2014) Fragility evaluation of intermediate significant bit embedding (ISBE) based digital image watermarking scheme for content authentication. In: 2014 International conference on advances in electronics computers and communications, IEEE pp 1–6

    Google Scholar 

  35. Gull S, Loan NA, Parah SA, Sheikh JA, Bhat GM (2018) An efficient watermarking technique for tamper detection and localization of medical images. J Ambient Intell Humanized Comput pp 1–10

    Google Scholar 

  36. Gull S, Parah SA (2019) Color image authentication using dual watermarks. In: 2019 fifth international conference on image information processing (ICIIP), pp 240–245

    Google Scholar 

  37. Giri KJ, Bashir R, Bhat JI (2019) A discrete wavelet based watermarking scheme for authentication of medical images. Int J E-Health Med Commun (IJEHMC), pp 30–38

    Google Scholar 

  38. UCI Machine Learning Repository, “Phishing Websites Dataset” https://archive.ics.uci.edu/ml/datasets/phishing+websites

  39. Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In 2012 International conference for internet technology and secured transactions, IEEE pp 492–497, IEEE

    Google Scholar 

  40. Alexa Most Popular sites. https://www.alexa.com/topsites

  41. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, pp 10–18

    Google Scholar 

  42. Quinlan JR (2014) C4.5: Programs for Machine Learning. Elsevier

    Google Scholar 

  43. Schapire RE (1990) The strength of weak learnability. Machine Learning, pp 197–227

    Google Scholar 

  44. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record pp 76–77

    Google Scholar 

  45. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines

    Google Scholar 

  46. Gandotra E, Bansal D, Sofat S (2016) Zero-day malware detection. In: 2016 sixth international symposium on embedded computing and system design (ISED), IEEE, pp 171–175

    Google Scholar 

  47. Gandotra E, Bansal D, Sofat S (2017) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 29–48

    Google Scholar 

  48. Gupta D, Rani R (2019) A study of big data evolution and research challenges. J Inf Sci 322–340 (2019)

    Google Scholar 

  49. Gupta D, Rani R (2018) Big data framework for zero-day malware detection. Cybern Syst 103–121

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gandotra, E., Gupta, D. (2021). An Efficient Approach for Phishing Detection using Machine Learning. In: Giri, K.J., Parah, S.A., Bashir, R., Muhammad, K. (eds) Multimedia Security. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-8711-5_12

Download citation

Publish with us

Policies and ethics