An Efficient Approach for Phishing Detection using Machine Learning

Gandotra, Ekta; Gupta, Deepak

doi:10.1007/978-981-15-8711-5_12

Ekta Gandotra⁸ &
Deepak Gupta⁹

Part of the book series: Algorithms for Intelligent Systems ((AIS))

1195 Accesses
29 Citations

Abstract

The increasing number of phishing attacks is one of the major concerns of security researchers today. The traditional tools for identifying phishing websites use signature-based approaches which are not able to detect newly created phishing webpages. Thus, researchers are coming up with machine learning-based methods which are capable to detect and classify the phishing webpages with high accuracy if a large and variety of features are considered. However, building a classification model using a large number of features takes time which hampers the timely detection of phishing websites. Therefore, it is pertinent to shortlist a set of features using a feature selection method so that high-performance classification models can be developed in less time. In this chapter, we study the role of feature selection methods in detecting phishing webpages efficiently and effectively. A comparative analysis of machine learning algorithms is carried out on the basis of their performance without and with feature selection. Experiments are conducted on a phishing dataset with 30 features containing 4898 phishing and 6157 benign webpages. Several machine learning algorithms are used for obtaining the best results. Afterward, a feature selection method is employed to improve the efficiency of the models. The best accuracy is obtained by random forest both before and after feature selection with a significant improvement in model building time. The experiments demonstrate that employing a feature selection method along with machine learning algorithms can improve the build time of classification models for phishing detection without compromising their accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anti-Phishing Working Group (APWG) https://docs.apwg.org//reports/apwg_trends_report_q4_2018.pdf
IC3 Annual Report 2018 https://pdf.ic3.gov/2018_IC3Report.pdf
Razorthorn phishing report https://www.razorthorn.co.uk/wp-content/uploads/2017/01/Phishing-S
Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Security 56–65
Google Scholar
Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electr Eng 106729
Google Scholar
Microsoft Security Intelligence Report (2019) vol 24 https://www.microsoft.com/security
Logic Bomb Set Off South Korea Cyberattack. https://www.wired.com/2013/03/logic-bomb-south-korea-attack/
Los Angeles Times https://www.latimes.com/business/la-fi-mh-anthem-is-warning-consumers-20150306-column.html
Threat Analysis Group, Findings on COVID-19 and online security threats https://blog.google/technology/safety-security/threat-analysis-group/findings-covid-19-and-online-security-threats/
Selenium https://docs.seleniumhq.org/download/
Gandotra E, Bansal D, Sofat S (2016) Tools and techniques for malware analysis and classification. Int J Next-Gener Comput
Google Scholar
Jsoup Java HTML Parser, with best of DOM, CSS, and jquery https://jsoup.org/
OpenDNS, PhishTank https://wwwphishtank.com/
Google Safe Browsing API—Google Code https://code.google.com/apis/safebrowsing/
Seifert C, Welch I, Komisarczuk P (2008) Identification of malicious web pages with static heuristics. In: 2008 Australasian Telecommunication Networks and Applications Conference, IEEE, pp 91–96
Google Scholar
Jain AK, Gupta BB (2017) Phishing detection: analysis of visual similarity based approaches. Secur Commun Network
Google Scholar
Gandotra E, Bansal D, Sofat S (2015) Computational techniques for predicting cyber threats. In: Intelligent computing, communication and devices, pp 247–253, Springer, New Delhi
Google Scholar
Tan CL, Chiew KL, Wong K (2016) PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decision Support Systems, pp 18–27
Google Scholar
Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo for phishing detection. Comput Security 16–26
Google Scholar
Jain AK, Gupta BB (2018) Towards detection of phishing websites on client-side using machine learning based approach. Telecommun Syst 687–700
Google Scholar
Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM workshop on cyber-physical system security, ACM, pp 33–42
Google Scholar
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 345–357
Google Scholar
Gu X, Wang H, Ni T (2013) An efficient approach to detecting phishing web. J Comput Inf Syst 5553–5560
Google Scholar
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert systems with applications, pp 231–242
Google Scholar
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), pp 1–28
Google Scholar
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on World Wide Web, ACM, (2007) pp 639–648
Google Scholar
Joshi A, Pattanshetti P, Tanuja R (2019) Phishing Attack Detection using Feature Selection Techniques. In: Nutan College of Engineering & Research, International Conference on Communication and Information Processing (ICCIP)
Google Scholar
Wu CY, Kuo CC, Yang CS (2019) A phishing detection system based on machine learning. In: 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA), pp 28–32
Google Scholar
Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. The Electronic Library
Google Scholar
Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N (2019) Phishing detection based on machine learning and feature selection methods. Int J Interactive Mobile Technol (iJIM) 171–183
Google Scholar
Yerima SY, Alzaylaee MK (2020) High accuracy phishing detection based on convolutional neural networks. arXiv preprint arXiv:2004.03960
Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing URLs: a machine learning approach. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology, IEEE, pp 220–223
Google Scholar
Hurrah NN, Parah SA, Loan NA, Sheikh JA, Elhoseny M, Muhammad K (2019) Dual watermarking framework for privacy protection and content authentication of multimedia. Future Gener Comput Syst 654–673
Google Scholar
Parah SA, Sheikh JA, Bhat GM (2014) Fragility evaluation of intermediate significant bit embedding (ISBE) based digital image watermarking scheme for content authentication. In: 2014 International conference on advances in electronics computers and communications, IEEE pp 1–6
Google Scholar
Gull S, Loan NA, Parah SA, Sheikh JA, Bhat GM (2018) An efficient watermarking technique for tamper detection and localization of medical images. J Ambient Intell Humanized Comput pp 1–10
Google Scholar
Gull S, Parah SA (2019) Color image authentication using dual watermarks. In: 2019 fifth international conference on image information processing (ICIIP), pp 240–245
Google Scholar
Giri KJ, Bashir R, Bhat JI (2019) A discrete wavelet based watermarking scheme for authentication of medical images. Int J E-Health Med Commun (IJEHMC), pp 30–38
Google Scholar
UCI Machine Learning Repository, “Phishing Websites Dataset” https://archive.ics.uci.edu/ml/datasets/phishing+websites
Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In 2012 International conference for internet technology and secured transactions, IEEE pp 492–497, IEEE
Google Scholar
Alexa Most Popular sites. https://www.alexa.com/topsites
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, pp 10–18
Google Scholar
Quinlan JR (2014) C4.5: Programs for Machine Learning. Elsevier
Google Scholar
Schapire RE (1990) The strength of weak learnability. Machine Learning, pp 197–227
Google Scholar
Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record pp 76–77
Google Scholar
Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines
Google Scholar
Gandotra E, Bansal D, Sofat S (2016) Zero-day malware detection. In: 2016 sixth international symposium on embedded computing and system design (ISED), IEEE, pp 171–175
Google Scholar
Gandotra E, Bansal D, Sofat S (2017) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 29–48
Google Scholar
Gupta D, Rani R (2019) A study of big data evolution and research challenges. J Inf Sci 322–340 (2019)
Google Scholar
Gupta D, Rani R (2018) Big data framework for zero-day malware detection. Cybern Syst 103–121
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Solan, India
Ekta Gandotra
Department of Computer Science and Engineering, Thapar Institute of Engineering & Technology (Deemed to be University), Patiala, India
Deepak Gupta

Authors

Ekta Gandotra
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Gupta .

Editor information

Editors and Affiliations

Department of Computer Science, Islamic University of Science and Technology, Pulwama, Jammu and Kashmir, India
Kaiser J. Giri
Department of Electronics and Instrumentation Technology, University of Kashmir, Srinagar, Jammu and Kashmir, India
Shabir Ahmad Parah
Department of Computer Science, Islamic University of Science and Technology, Pulwama, Jammu and Kashmir, India
Rumaan Bashir
Department of Software, Sejong University, Seoul, Korea (Republic of)
Khan Muhammad

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gandotra, E., Gupta, D. (2021). An Efficient Approach for Phishing Detection using Machine Learning. In: Giri, K.J., Parah, S.A., Bashir, R., Muhammad, K. (eds) Multimedia Security. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-8711-5_12

Download citation

DOI: https://doi.org/10.1007/978-981-15-8711-5_12
Published: 12 January 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8710-8
Online ISBN: 978-981-15-8711-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics