Abstract
Phishing is an attack that attempts to replicate the official websites of businesses, including government agencies, financial institutions, e-commerce platforms, and banks. These fraudulent websites aim to obtain sensitive information from users, such as credit card numbers, email addresses, passwords, and personal identities. In response to the increasing number of phishing assaults, several anti-phishing strategies have been developed. However, existing techniques often fail to extract the most crucial features, leading to potential misclassification. Additionally, the complex algorithms employed result in high response times. To address these challenges, this paper proposes a novel approach called Bidirectional Long Short-Term Memory based Gated Highway Attention block Convolutional Neural Network (BiLSTM-GHA-CNN) for detecting phishing URLs. The BiLSTM captures contextual features, while the CNN extracts salient features. The integration of the highway network into the BiLSTM-CNN architecture enables the capture of significant features with rapid convergence. Furthermore, a gating mechanism is employed to weigh the output features of the CNN and BiLSTM. Five datasets from diverse sources such as Phish Tank and Open Phish were created for experimentation. The results demonstrate that BiLSTM-GHA-CNN achieves superior detection accuracy, precision recall, and F1-score compared to state-of-the-art techniques. Moreover, the proposed system significantly reduces the response time to a remarkable 12.46 ms.
Similar content being viewed by others
Data Availability
Web page Phishing Detection dataset: https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset
PhishTank: https://www.phishtank.com/
OpenPhish: https://www.openphish.com/
Banking website: https://www.similarweb.com/top-websites/category/finance/banking-credit-and-lending/
Common Crawl: https://www.domcop.com/top-10-million-websites
Abbreviations
- ML :
-
Machine Learning
- BiLSTM :
-
Bidirectional Long Short-Term Memory
- GHA :
-
Gated Highway Attention
- CNN :
-
Convolutional Neural Network
- KNN :
-
K-Nearest Neighbors
- SVM :
-
Support Vector Machine
- MFET :
-
Multiple feature extraction technique
- AWT :
-
Automated Whitelist Technique
- MSA :
-
Multi-head self-attention
- GAN :
-
Generative Adversarial Network
- NIOSELM :
-
Non-inverse matrix extreme learning machine
- ADASYN :
-
Adaptive Synthetic Sampling
- DAE :
-
Denoising auto-encoder
- DNN :
-
Deep Neural Network
- DTOF-ANN :
-
Decision Tree and Optimal Features based Artificial Neural Network
- TF-IDF :
-
Term Frequency-Inverse Document Frequency
- HABCNN :
-
Highway attention block CNN
- ABM :
-
Attention block module
- RNN :
-
Recurrent Neural Network
- TPR :
-
True Positive Rate
- FPR :
-
False Positive Rate
- TNR :
-
True Negative Rate
- FNR :
-
False Negative Rate
References
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357
Sánchez-Paniagua M, Fernández EF, Alegre E, Al-Nabki W, Gonzalez-Castro V (2022) Phishing URL detection: A real-case scenario through login URLs. IEEE Access 10:42949–42960
Hota HS, Shrivas AK, Hota R (2018) An ensemble model for detecting phishing attack with proposed remove-replace feature selection technique. Procedia Comput Sci 132:900–907
Vijayalakshmi M, Mercy Shalinie S, Yang MH, U RM, (2020) Web phishing detection techniques: a survey on the state-of-the-art, taxonomy and future directions. Iet Networks 9(5):235–246
Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Humaniz Comput 10(5):2015–2028
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209
Ahammad SH, Kale SD, Upadhye GD, Pande SD, Babu EV, Dhumane AV, Bahadur MD (2022) Phishing URL detection using machine learning methods. Adv Eng Softw 173:103288
Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using URL and HTML features for phishing webpage detection. Futur Gener Comput Syst 94:27–39
Sameen M, Han K, Hwang SO (2020) PhishHaven—an efficient real-time ai phishing URLs detection system. IEEE Access 8:83425–83443
Sonowal G, Kuppusamy KS (2020) PhiDMA–A phishing detection model with multi-filter approach. J King Saud Univ-Comput Inf Sci 32(1):99–112
El Aassal A, Baki S, Das A, Verma RM (2020) An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8:22170–22192
Elsadig M, Ibrahim AO, Basheer S, Alohali MA, Alshunaifi S, Alqahtani H, Alharbi N, Nagmeldin W (2022) Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction. Electronics 11(22):3647
Suleman MT, Awan SM (2019) Optimization of URL-based phishing websites detection through genetic algorithms. Autom Control Comput Sci 53(4):333–341
Catal C, Giray G, Tekinerdogan B, Kumar S, Shukla S (2022) Applications of deep learning for phishing detection: a systematic literature review. Knowl Inf Syst 64(6):1457–1500
Barraclough HPA, Fehringer G, Woodward J (2021) Intelligent cyber-phishing detection for online. Comput Secur 104:102123
SatheeshKumar M, Srinivasagan KG, UnniKrishnan G (2022) A lightweight and proactive rule-based incremental construction approach to detect phishing scam. Inf Technol Manage 23(4):271–298
Aldakheel EA, Zakariah M, Gashgari GA, Almarshad FA, Alzahrani AI (2023) A Deep Learning-Based Innovative Technique for Phishing Detection in Modern Security with Uniform Resource Locators. Sensors 23(9):4403
Assegie TA (2021) K-nearest neighbor based url identification model for phishing attack detection. Indian J Artif Intell Neural Networking (IJAINN). https://doi.org/10.35940/ijainn.B1019.041221
Alsariera YA, Elijah AV, Balogun AO (2020) Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab J Sci Eng 45:10459–10470
Anupam S, Kar AK (2021) Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst 76(1):17–32
Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379
Gupta BB, Yadav K, Razzak I, Psannis K, Castiglione A, Chang X (2021) A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment. Comput Commun 175:47–57
Azeez NA, Misra S, Margaret IA, Fernandez-Sanz L (2021) Adopting automated whitelist approach for detecting phishing attacks. Comput Secur 108:102328
Xiao X, Xiao W, Zhang D, Zhang B, Hu G, Li Q, Xia S (2021) Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets. Comput Secur 108:102372
Yang L, Zhang J, Wang X, Li Z, Li Z, He Y (2021) An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features. Expert Syst Appl 165:113863
Alshingiti Z, Alaqel R, Al-Muhtadi J, Haq QE, Saleem K, Faheem MH (2023) A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics 12(1):232
Ozcan A, Catal C, Donmez E, Senturk B (2021) A hybrid DNN–LSTM model for detecting phishing URLs. Neural Comput Appl 1–7
Zhu E, Ju Y, Chen Z, Liu F, Fang X (2020) DTOF-ANN: an artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput 95:106505
Rao RS, Vaishnavi T, Pais AR (2020) CatchPhish: detection of phishing websites by inspecting URLs. J Ambient Intell Humaniz Comput 11(2):813–825
Nowroozi E, Mohammadi M, Conti M (2022) An adversarial attack analysis on malicious advertisement url detection framework. IEEE Trans Netw Serv Man https://doi.org/10.1109/TNSM.2022.3225217
Karim A, Shahroz M, Mustofa K, Belhaouari SB, Joga SR (2023) Phishing detection system through hybrid machine learning based on URL. IEEE Access 11:36805–36822
Prabakaran MK, MeenakshiSundaram P, Chandrasekar AD (2023) An enhanced deep learning-based phishing detection mechanism to effectively identify malicious URLs using variational autoencoders. IET Inf Secur 17(3):423–440
Kumar PP, Jaya T, Rajendran V (2023) SI-BBA–A novel phishing website detection based on Swarm intelligence with deep learning. Materials Today: Proceedings 80:3129–3139
Su KW, Wu KP, Lee HM, Wei TE (2013) Suspicious URL filtering based on logistic regression with multi-view analysis. In 2013 Eighth Asia joint conference on information security (pp. 77–84). IEEE
Ali F, Khan P, Riaz K, Kwak D, Abuhmed T, Park D, Kwak KS (2017) A fuzzy ontology and SVM–based Web content classification system. IEEE Access 5:25781–25797
Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl Comput Inf 12(1):90–108
Subasi A, Molah E, Almkallawi F, Chaudhery TJ (2017) Intelligent phishing website detection using random forest classifier. In 2017 International conference on electrical and computing technologies and applications (ICECTA) (pp. 1–5). IEEE
He S, Li B, Peng H, Xin J, Zhang E (2021) An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset. IEEE Access 9:93089–93096
Subasi A, Kremic E (2020) Comparison of adaboost with multi boosting for phishing website detection. Procedia Comput Sci 168:272–278
Rajalakshmi R, Aravindan C (2018) A Naive Bayes approach for URL classification with supervised feature selection and rejection framework. Comput Intell 34(1):363–396
Krishnan M, Lim Y, Perumal S, Palanisamy G (2022) Detection and defending the XSS attack using novel hybrid stacking ensemble learning-based DNN approach. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.09.024
Somesha M, Pais AR, Rao RS, Rathour VS (2020) Efficient deep learning techniques for the detection of phishing websites. Sādhanā 45:1–8
Roy SS, Awad AI, Amare LA, Erkihun MT, Anas M (2022) Multimodel Phishing URL Detection Using LSTM, Bidirectional LSTM, and GRU Models. Future Internet 14(11):340
Tang L, Mahmoud QH (2021) A deep learning-based framework for phishing website detection. IEEE Access 10:1509–1521
Al-Ahmadi S, Alotaibi A, Alsaleh O (2022) PDGAN: Phishing Detection with Generative Adversarial Networks. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3168235
Firdaus M, Madasu A, Ekbal A (2023) A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System. arXiv preprint arXiv:2305.17433. https://doi.org/10.1007/s11042-023-15915-8
Funding
There is no funding for this study.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Manika Nanda, Shivani Goel. The first draft of the manuscript was written by Manika Nanda and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Conceptualization: Manika Nanda, Pankaj Sharma; Methodology: Manika Nanda; Formal analysis and investigation: Manika Nanda, Shivani Goel; Writing—original draft preparation: Manika Nanda, Pankaj Sharma; Writing—review and editing: Shivani Goel, Pankaj Sharma; Supervision: Shivani Goel.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants and/or animals performed by any of the authors.
Informed consent
There is no informed consent for this study.
Conflict of Interest
Authors declares that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nanda, M., Goel, S. URL based phishing attack detection using BiLSTM-gated highway attention block convolutional neural network. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-17993-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-023-17993-0