Skip to main content
Log in

URL based phishing attack detection using BiLSTM-gated highway attention block convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Phishing is an attack that attempts to replicate the official websites of businesses, including government agencies, financial institutions, e-commerce platforms, and banks. These fraudulent websites aim to obtain sensitive information from users, such as credit card numbers, email addresses, passwords, and personal identities. In response to the increasing number of phishing assaults, several anti-phishing strategies have been developed. However, existing techniques often fail to extract the most crucial features, leading to potential misclassification. Additionally, the complex algorithms employed result in high response times. To address these challenges, this paper proposes a novel approach called Bidirectional Long Short-Term Memory based Gated Highway Attention block Convolutional Neural Network (BiLSTM-GHA-CNN) for detecting phishing URLs. The BiLSTM captures contextual features, while the CNN extracts salient features. The integration of the highway network into the BiLSTM-CNN architecture enables the capture of significant features with rapid convergence. Furthermore, a gating mechanism is employed to weigh the output features of the CNN and BiLSTM. Five datasets from diverse sources such as Phish Tank and Open Phish were created for experimentation. The results demonstrate that BiLSTM-GHA-CNN achieves superior detection accuracy, precision recall, and F1-score compared to state-of-the-art techniques. Moreover, the proposed system significantly reduces the response time to a remarkable 12.46 ms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

Web page Phishing Detection dataset: https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset

PhishTank: https://www.phishtank.com/

OpenPhish: https://www.openphish.com/

Banking website: https://www.similarweb.com/top-websites/category/finance/banking-credit-and-lending/

Common Crawl: https://www.domcop.com/top-10-million-websites

Abbreviations

ML :

Machine Learning

BiLSTM :

Bidirectional Long Short-Term Memory

GHA :

Gated Highway Attention

CNN :

Convolutional Neural Network

KNN :

K-Nearest Neighbors

SVM :

Support Vector Machine

MFET :

Multiple feature extraction technique

AWT :

Automated Whitelist Technique

MSA :

Multi-head self-attention

GAN :

Generative Adversarial Network

NIOSELM :

Non-inverse matrix extreme learning machine

ADASYN :

Adaptive Synthetic Sampling

DAE :

Denoising auto-encoder

DNN :

Deep Neural Network

DTOF-ANN :

Decision Tree and Optimal Features based Artificial Neural Network

TF-IDF :

Term Frequency-Inverse Document Frequency

HABCNN :

Highway attention block CNN

ABM :

Attention block module

RNN :

Recurrent Neural Network

TPR :

True Positive Rate

FPR :

False Positive Rate

TNR :

True Negative Rate

FNR :

False Negative Rate

References

  1. Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357

    Article  Google Scholar 

  2. Sánchez-Paniagua M, Fernández EF, Alegre E, Al-Nabki W, Gonzalez-Castro V (2022) Phishing URL detection: A real-case scenario through login URLs. IEEE Access 10:42949–42960

    Article  Google Scholar 

  3. Hota HS, Shrivas AK, Hota R (2018) An ensemble model for detecting phishing attack with proposed remove-replace feature selection technique. Procedia Comput Sci 132:900–907

    Article  Google Scholar 

  4. Vijayalakshmi M, Mercy Shalinie S, Yang MH, U RM, (2020) Web phishing detection techniques: a survey on the state-of-the-art, taxonomy and future directions. Iet Networks 9(5):235–246

    Article  Google Scholar 

  5. Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Humaniz Comput 10(5):2015–2028

    Article  Google Scholar 

  6. Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166

    Article  Google Scholar 

  7. Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196–15209

    Article  Google Scholar 

  8. Ahammad SH, Kale SD, Upadhye GD, Pande SD, Babu EV, Dhumane AV, Bahadur MD (2022) Phishing URL detection using machine learning methods. Adv Eng Softw 173:103288

    Article  Google Scholar 

  9. Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using URL and HTML features for phishing webpage detection. Futur Gener Comput Syst 94:27–39

    Article  Google Scholar 

  10. Sameen M, Han K, Hwang SO (2020) PhishHaven—an efficient real-time ai phishing URLs detection system. IEEE Access 8:83425–83443

    Article  Google Scholar 

  11. Sonowal G, Kuppusamy KS (2020) PhiDMA–A phishing detection model with multi-filter approach. J King Saud Univ-Comput Inf Sci 32(1):99–112

    Google Scholar 

  12. El Aassal A, Baki S, Das A, Verma RM (2020) An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8:22170–22192

    Article  Google Scholar 

  13. Elsadig M, Ibrahim AO, Basheer S, Alohali MA, Alshunaifi S, Alqahtani H, Alharbi N, Nagmeldin W (2022) Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction. Electronics 11(22):3647

    Article  Google Scholar 

  14. Suleman MT, Awan SM (2019) Optimization of URL-based phishing websites detection through genetic algorithms. Autom Control Comput Sci 53(4):333–341

    Article  Google Scholar 

  15. Catal C, Giray G, Tekinerdogan B, Kumar S, Shukla S (2022) Applications of deep learning for phishing detection: a systematic literature review. Knowl Inf Syst 64(6):1457–1500

    Article  Google Scholar 

  16. Barraclough HPA, Fehringer G, Woodward J (2021) Intelligent cyber-phishing detection for online. Comput Secur 104:102123

    Article  Google Scholar 

  17. SatheeshKumar M, Srinivasagan KG, UnniKrishnan G (2022) A lightweight and proactive rule-based incremental construction approach to detect phishing scam. Inf Technol Manage 23(4):271–298

    Article  Google Scholar 

  18. Aldakheel EA, Zakariah M, Gashgari GA, Almarshad FA, Alzahrani AI (2023) A Deep Learning-Based Innovative Technique for Phishing Detection in Modern Security with Uniform Resource Locators. Sensors 23(9):4403

    Article  Google Scholar 

  19. Assegie TA (2021) K-nearest neighbor based url identification model for phishing attack detection. Indian J Artif Intell Neural Networking (IJAINN). https://doi.org/10.35940/ijainn.B1019.041221

  20. Alsariera YA, Elijah AV, Balogun AO (2020) Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab J Sci Eng 45:10459–10470

    Article  Google Scholar 

  21. Anupam S, Kar AK (2021) Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst 76(1):17–32

    Article  Google Scholar 

  22. Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379

    Article  MathSciNet  Google Scholar 

  23. Gupta BB, Yadav K, Razzak I, Psannis K, Castiglione A, Chang X (2021) A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment. Comput Commun 175:47–57

    Article  Google Scholar 

  24. Azeez NA, Misra S, Margaret IA, Fernandez-Sanz L (2021) Adopting automated whitelist approach for detecting phishing attacks. Comput Secur 108:102328

    Article  Google Scholar 

  25. Xiao X, Xiao W, Zhang D, Zhang B, Hu G, Li Q, Xia S (2021) Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets. Comput Secur 108:102372

    Article  Google Scholar 

  26. Yang L, Zhang J, Wang X, Li Z, Li Z, He Y (2021) An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features. Expert Syst Appl 165:113863

    Article  Google Scholar 

  27. Alshingiti Z, Alaqel R, Al-Muhtadi J, Haq QE, Saleem K, Faheem MH (2023) A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. Electronics 12(1):232

    Article  Google Scholar 

  28. Ozcan A, Catal C, Donmez E, Senturk B (2021) A hybrid DNN–LSTM model for detecting phishing URLs. Neural Comput Appl 1–7

  29. Zhu E, Ju Y, Chen Z, Liu F, Fang X (2020) DTOF-ANN: an artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput 95:106505

    Article  Google Scholar 

  30. Rao RS, Vaishnavi T, Pais AR (2020) CatchPhish: detection of phishing websites by inspecting URLs. J Ambient Intell Humaniz Comput 11(2):813–825

    Article  Google Scholar 

  31. Nowroozi E, Mohammadi M, Conti M (2022) An adversarial attack analysis on malicious advertisement url detection framework. IEEE Trans Netw Serv Man https://doi.org/10.1109/TNSM.2022.3225217

  32. Karim A, Shahroz M, Mustofa K, Belhaouari SB, Joga SR (2023) Phishing detection system through hybrid machine learning based on URL. IEEE Access 11:36805–36822

    Article  Google Scholar 

  33. Prabakaran MK, MeenakshiSundaram P, Chandrasekar AD (2023) An enhanced deep learning-based phishing detection mechanism to effectively identify malicious URLs using variational autoencoders. IET Inf Secur 17(3):423–440

    Article  Google Scholar 

  34. Kumar PP, Jaya T, Rajendran V (2023) SI-BBA–A novel phishing website detection based on Swarm intelligence with deep learning. Materials Today: Proceedings 80:3129–3139

    Google Scholar 

  35. Su KW, Wu KP, Lee HM, Wei TE (2013) Suspicious URL filtering based on logistic regression with multi-view analysis. In 2013 Eighth Asia joint conference on information security (pp. 77–84). IEEE

  36. Ali F, Khan P, Riaz K, Kwak D, Abuhmed T, Park D, Kwak KS (2017) A fuzzy ontology and SVM–based Web content classification system. IEEE Access 5:25781–25797

    Article  Google Scholar 

  37. Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl Comput Inf 12(1):90–108

    Google Scholar 

  38. Subasi A, Molah E, Almkallawi F, Chaudhery TJ (2017) Intelligent phishing website detection using random forest classifier. In 2017 International conference on electrical and computing technologies and applications (ICECTA) (pp. 1–5). IEEE

  39. He S, Li B, Peng H, Xin J, Zhang E (2021) An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset. IEEE Access 9:93089–93096

    Article  Google Scholar 

  40. Subasi A, Kremic E (2020) Comparison of adaboost with multi boosting for phishing website detection. Procedia Comput Sci 168:272–278

    Article  Google Scholar 

  41. Rajalakshmi R, Aravindan C (2018) A Naive Bayes approach for URL classification with supervised feature selection and rejection framework. Comput Intell 34(1):363–396

    Article  MathSciNet  Google Scholar 

  42. Krishnan M, Lim Y, Perumal S, Palanisamy G (2022) Detection and defending the XSS attack using novel hybrid stacking ensemble learning-based DNN approach. Digital Communications and Networks. https://doi.org/10.1016/j.dcan.2022.09.024

  43. Somesha M, Pais AR, Rao RS, Rathour VS (2020) Efficient deep learning techniques for the detection of phishing websites. Sādhanā 45:1–8

    Article  Google Scholar 

  44. Roy SS, Awad AI, Amare LA, Erkihun MT, Anas M (2022) Multimodel Phishing URL Detection Using LSTM, Bidirectional LSTM, and GRU Models. Future Internet 14(11):340

    Article  Google Scholar 

  45. Tang L, Mahmoud QH (2021) A deep learning-based framework for phishing website detection. IEEE Access 10:1509–1521

    Article  Google Scholar 

  46. Al-Ahmadi S, Alotaibi A, Alsaleh O (2022) PDGAN: Phishing Detection with Generative Adversarial Networks. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3168235

  47. Firdaus M, Madasu A, Ekbal A (2023) A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System. arXiv preprint arXiv:2305.17433. https://doi.org/10.1007/s11042-023-15915-8

Download references

Funding

There is no funding for this study.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Manika Nanda, Shivani Goel. The first draft of the manuscript was written by Manika Nanda and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Conceptualization: Manika Nanda, Pankaj Sharma; Methodology: Manika Nanda; Formal analysis and investigation: Manika Nanda, Shivani Goel; Writing—original draft preparation: Manika Nanda, Pankaj Sharma; Writing—review and editing: Shivani Goel, Pankaj Sharma; Supervision: Shivani Goel.

Corresponding author

Correspondence to Manika Nanda.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants and/or animals performed by any of the authors.

Informed consent

There is no informed consent for this study.

Conflict of Interest

Authors declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nanda, M., Goel, S. URL based phishing attack detection using BiLSTM-gated highway attention block convolutional neural network. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-17993-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17993-0

Keywords

Navigation