SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks

Ariyadasa, Subhash; Fernando, Shantha; Fernando, Subha

doi:10.1007/s10207-023-00778-9

SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks

Regular Contribution
Published: 21 November 2023

Volume 23, pages 1055–1076, (2024)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

189 Accesses
Explore all metrics

Abstract

Phishing, a well-known cyberattack that cannot be completely eradicated from the Internet, has increased dramatically since the COVID-19 pandemic. Despite previous efforts to reduce this prevalent Internet threat, constantly changing attacks make phishing detection a difficult task. The lack of continuous learning support provided by existing solutions and the lack of a systematic knowledge acquisition process make its detection more difficult. SmartiPhish is introduced in this context as the first anti-phishing solution with integrated continuous learning support with an innovative knowledge acquisition process. SmartiPhish combines deep learning and reinforcement learning to have a successful phishing detection solution. The deep learning model predicts a phishing probability for a given web page based on the URL and HTML content, and the probability is then passed to a reinforcement learning environment to make a decision based on the popularity of the web page and prior knowledge of it. SmartiPhish has a detection accuracy of 96.40% and a detection time of 4.3 s. SmartiPhish performs well in an imbalanced environment, and zero-day attack detection is also interesting. Furthermore, SmartiPhish demonstrated a 5.65% performance improvement in just six weeks, in contrast to the existing anti-phishing tools’ declining performance trend over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Catching a Phish: Frontiers of Deep Learning-Based Anticipating Detection Engines

A Survey on Phishing Website Detection Using Deep Neural Networks

Life-long phishing attack detection using continual learning

Article Open access 17 July 2023

Availability of data

The primary dataset supporting the conclusions of this article is available in the Mendeley Data repository, n96ncsr5g4/1, https://data.mendeley.com/datasets/n96ncsr5g4/1. The specific data supporting some of the findings of this study are available from the corresponding author on request.

Availability of code and materials

The code required to implement most of the research manuscript’s concepts and findings can be accessed in the GitHub repository, which can be found at https://github.com/sna-hm/SmartiPhish. The practical application of the proposed solution in this manuscript, particularly in web surfing tasks within a natural web environment, has been demonstrated through a publicly shared video (https://youtu.be/MddiKIFvXM).

References

Chiew, K.L., Yong, K.S.C., Tan, C.L.: A survey of phishing attacks: their types, vectors and technical approaches. Expert Syst. Appl. 106, 1–20 (2018). https://doi.org/10.1016/j.eswa.2018.03.050
Article Google Scholar
Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Guizani, M.: Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun. Surv. Tutor. 19(4), 2797–2819 (2017). https://doi.org/10.1109/comst.2017.2752087
Article Google Scholar
European Union Agency for Cybersecurity: Enisa threat landscape report 2018: 15 top cyber threats and trends. Technical report (2019). https://doi.org/10.2824/622757
APWG: Phishing activity trends report: 4th quarter 2021. Technical report, Anti-Phishing Working Group (2022)
Huang, H., Zhong, S., Tan, J.: Browser-side countermeasures for deceptive phishing attack. In: 2009 Fifth International Conference on Information Assurance and Security, vol. 1, pp. 352–355 (2009). https://doi.org/10.1109/IAS.2009.12
Yu, W.D., Nargundkar, S., Tiruthani, N.: A phishing vulnerability analysis of web based systems. In: 2008 IEEE Symposium on Computers and Communications, pp. 326–331 (2008). https://doi.org/10.1109/ISCC.2008.4625681
Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. (2021). https://doi.org/10.3389/fcomp.2021.563060
Article Google Scholar
Oest, A., Zhang, P., Wardman, B., Nunes, E., Burgis, J., Zand, A., Thomas, K., Doup´e, A., Ahn, G.-J.: Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale. In: 29th {USENIX} Security Symposium ({USENIX} Security 20) (2020)
Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Futur. Gener. Comput. Syst. 94, 27–39 (2019). https://doi.org/10.1016/j.future.2018.11.004
Article Google Scholar
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013). https://doi.org/10.1109/surv.2013.032213.00009
Article Google Scholar
Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., Gonz´alez, F.A.: Classifying phishing urls using recurrent neural networks. In: 2017 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2017). https://doi.org/10.1109/ECRIME.2017.7945048
Feng, J., Zou, L., Ye, O., Han, J.: Web2vec: phishing webpage detection method based on multidimensional features driven by deep learning. IEEE Access 8, 221214–221224 (2020). https://doi.org/10.1109/access.2020.3043188
Article Google Scholar
Opara, C., Chen, Y., wei, B.: Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics. arXiv (2020). https://doi.org/10.48550/ARXIV.2011.04412. https://arxiv.org/abs/2011.04412
Aassal, A.E., Baki, S., Das, A., Verma, R.M.: An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8, 22170–22192 (2020). https://doi.org/10.1109/access.2020.2969780
Article Google Scholar
Opara, C., Wei, B., Chen, Y.: Htmlphish: Enabling phishing web page detection by applying deep learning techniques on html analysis. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207707
Shirazi, H., Bezawada, B., Ray, I., Anderson, C.: Adversarial sampling attacks against phishing detection. In: Foley, S.N. (ed.) Data and Applications Security and Privacy XXXIII, pp. 83–101. Springer, Cham (2019)
Chapter Google Scholar
Ariyadasa, S., Fernando, S., Fernando, S.: Combining long-term recurrent convolutional and graph convolutional networks to detect phishing sites using URL and HTML. IEEE Access 10, 82355–82375 (2022). https://doi.org/10.1109/access.2022.3196018
Article Google Scholar
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL Detection using Machine Learning: A Survey. arXiv (2017). https://doi.org/10.48550/ARXIV.1701.07179. https://arxiv.org/abs/1701.07179
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Article Google Scholar
El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017). https://doi.org/10.1093/comjnl/bxx035
Article Google Scholar
Buber, E., Demir, O., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017). https://doi.org/10.1109/IDAP.2017.8090317
Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019). https://doi.org/10.1109/access.2019.2892066
Article Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019). https://doi.org/10.1016/j.eswa.2018.09.029
Article Google Scholar
Wang, W., Zhang, F., Luo, X., Zhang, S.: PDRCNN: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw. 2019, 1–15 (2019). https://doi.org/10.1155/2019/2595794
Article Google Scholar
Sameen, M., Han, K., Hwang, S.O.: PhishHaven—an efficient real-time AI phishing URLs detection system. IEEE Access 8, 83425–83443 (2020). https://doi.org/10.1109/access.2020.2991403
Article Google Scholar
Chen, W., Zhang, W., Su, Y.: Phishing detection research based on lstm recurrent neural network. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds.) Data Science, pp. 638–645. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2203-752
Chapter Google Scholar
Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish: simulating malicious ai. In: 2018 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
Chauhan, N.K., Singh, K.: A review on conventional machine learning vs deep learning. In: 2018 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 347–352 (2018). https://doi.org/10.1109/GUCON.2018.8675097
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2018)
Chatterjee, M., Namin, A.S.: Deep Reinforcement Learning for Detecting Malicious Websites. arXiv (2019). https://doi.org/10.48550/ARXIV.1905.09207. https://arxiv.org/abs/1905.09207
Alabdan, R.: Phishing attacks survey: types, vectors, and technical approaches. Future Internet 12(10), 168 (2020). https://doi.org/10.3390/fi12100168
Article Google Scholar
Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish : Simulating malicious ai. (2018)
Verma, R.M., Zeng, V., Faridi, H.: Data quality for security challenges. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3319535.3363267
Butnaru, A., Mylonas, A., Pitropakis, N.: Towards lightweight URL-based phishing detection. Future Internet 13(6), 154 (2021). https://doi.org/10.3390/fi13060154
Article Google Scholar
Ariyadasa, S., Fernando, S., Fernando, S.: PhishRepo: a seamless collection of phishing data to fill a research gap in the phishing domain. Int. J. Adv. Comput. Sci. Appl. (2022). https://doi.org/10.14569/ijacsa.2022.0130597
Article Google Scholar
Wu, C.-Y., Kuo, C.-C., Yang, C.-S.: A phishing detection system based on machine learning. In: 2019 International Conference on Intelligent Computing and Its Emerging Applications (ICEA), pp. 28–32 (2019). https://doi.org/10.1109/ICEA.2019.8858325
Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saudi Univ. Comput. Inf. Sci. 34(2), 232–247 (2022). https://doi.org/10.1016/j.jksuci.2019.12.005
Article Google Scholar
Ariyadasa, S., Fernando, S., Fernando, S.: Detecting phishing attacks using a combined model of LSTM and CNN. Int. J. Adv. Appl. Sci. 7(7), 56–67 (2020). https://doi.org/10.21833/ijaas.2020.07.007
Article Google Scholar
Franc¸ois-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018). https://doi.org/10.1561/2200000071
Article Google Scholar
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv (2020). https://doi.org/10.48550/ARXIV.2005.01643. https://arxiv.org/abs/2005.01643
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Tuan Nguyen, L.A., To, B.L., Nguyen, H.K., Nguyen, M.H.: An efficient approach for phishing detection using single-layer neural network. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014), pp. 435–440 (2014). https://doi.org/10.1109/ATC.2014.7043427
Ariyadasa, S., Fernando, S., Fernando, S.: Phishing websites dataset. Mendeley (2021). https://doi.org/10.17632/N96NCSR5G4.1
Article Google Scholar

Download references

Acknowledgements

There is no any third person or organisation to acknowledge.

Funding

The authors declare that the research does not use any funding sources for the work. There are no any funding sources to disclose.

Author information

Authors and Affiliations

Department of Computational Mathematics, University of Moratuwa, Katubadda, Moratuwa, 10400, Western, Sri Lanka
Subhash Ariyadasa & Subha Fernando
Computer Science and Engineering, University of Moratuwa, Katubadda, Moratuwa, 10400, Western, Sri Lanka
Shantha Fernando
Department of Computer Science and Informatics, Uva Wellassa University, Passara Road, Badulla, Uva, 90000, Sri Lanka
Subhash Ariyadasa

Authors

Subhash Ariyadasa
View author publications
You can also search for this author in PubMed Google Scholar
Shantha Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Subha Fernando
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The main manuscript and all its components were authored by SA. In addition to writing, SA took responsibility for designing, developing, and implementing the experiments detailed in the manuscript. The supervision, review, and valuable suggestions for further improvements in design and implementation were provided by SF and SF.

Corresponding author

Correspondence to Subhash Ariyadasa.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ariyadasa, S., Fernando, S. & Fernando, S. SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks. Int. J. Inf. Secur. 23, 1055–1076 (2024). https://doi.org/10.1007/s10207-023-00778-9

Download citation

Accepted: 22 October 2023
Published: 21 November 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10207-023-00778-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks

Abstract

Access this article

Similar content being viewed by others

Catching a Phish: Frontiers of Deep Learning-Based Anticipating Detection Engines

A Survey on Phishing Website Detection Using Deep Neural Networks

Life-long phishing attack detection using continual learning

Availability of data

Availability of code and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks

Abstract

Access this article

Similar content being viewed by others

Catching a Phish: Frontiers of Deep Learning-Based Anticipating Detection Engines

A Survey on Phishing Website Detection Using Deep Neural Networks

Life-long phishing attack detection using continual learning

Availability of data

Availability of code and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation