Skip to main content
Log in

SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Phishing, a well-known cyberattack that cannot be completely eradicated from the Internet, has increased dramatically since the COVID-19 pandemic. Despite previous efforts to reduce this prevalent Internet threat, constantly changing attacks make phishing detection a difficult task. The lack of continuous learning support provided by existing solutions and the lack of a systematic knowledge acquisition process make its detection more difficult. SmartiPhish is introduced in this context as the first anti-phishing solution with integrated continuous learning support with an innovative knowledge acquisition process. SmartiPhish combines deep learning and reinforcement learning to have a successful phishing detection solution. The deep learning model predicts a phishing probability for a given web page based on the URL and HTML content, and the probability is then passed to a reinforcement learning environment to make a decision based on the popularity of the web page and prior knowledge of it. SmartiPhish has a detection accuracy of 96.40% and a detection time of 4.3 s. SmartiPhish performs well in an imbalanced environment, and zero-day attack detection is also interesting. Furthermore, SmartiPhish demonstrated a 5.65% performance improvement in just six weeks, in contrast to the existing anti-phishing tools’ declining performance trend over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Availability of data

The primary dataset supporting the conclusions of this article is available in the Mendeley Data repository, n96ncsr5g4/1, https://data.mendeley.com/datasets/n96ncsr5g4/1. The specific data supporting some of the findings of this study are available from the corresponding author on request.

Availability of code and materials

The code required to implement most of the research manuscript’s concepts and findings can be accessed in the GitHub repository, which can be found at https://github.com/sna-hm/SmartiPhish. The practical application of the proposed solution in this manuscript, particularly in web surfing tasks within a natural web environment, has been demonstrated through a publicly shared video (https://youtu.be/MddiKIFvXM).

References

  1. Chiew, K.L., Yong, K.S.C., Tan, C.L.: A survey of phishing attacks: their types, vectors and technical approaches. Expert Syst. Appl. 106, 1–20 (2018). https://doi.org/10.1016/j.eswa.2018.03.050

    Article  Google Scholar 

  2. Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Guizani, M.: Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun. Surv. Tutor. 19(4), 2797–2819 (2017). https://doi.org/10.1109/comst.2017.2752087

    Article  Google Scholar 

  3. European Union Agency for Cybersecurity: Enisa threat landscape report 2018: 15 top cyber threats and trends. Technical report (2019). https://doi.org/10.2824/622757

  4. APWG: Phishing activity trends report: 4th quarter 2021. Technical report, Anti-Phishing Working Group (2022)

  5. Huang, H., Zhong, S., Tan, J.: Browser-side countermeasures for deceptive phishing attack. In: 2009 Fifth International Conference on Information Assurance and Security, vol. 1, pp. 352–355 (2009). https://doi.org/10.1109/IAS.2009.12

  6. Yu, W.D., Nargundkar, S., Tiruthani, N.: A phishing vulnerability analysis of web based systems. In: 2008 IEEE Symposium on Computers and Communications, pp. 326–331 (2008). https://doi.org/10.1109/ISCC.2008.4625681

  7. Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. (2021). https://doi.org/10.3389/fcomp.2021.563060

    Article  Google Scholar 

  8. Oest, A., Zhang, P., Wardman, B., Nunes, E., Burgis, J., Zand, A., Thomas, K., Doup´e, A., Ahn, G.-J.: Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale. In: 29th {USENIX} Security Symposium ({USENIX} Security 20) (2020)

  9. Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Futur. Gener. Comput. Syst. 94, 27–39 (2019). https://doi.org/10.1016/j.future.2018.11.004

    Article  Google Scholar 

  10. Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013). https://doi.org/10.1109/surv.2013.032213.00009

    Article  Google Scholar 

  11. Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., Gonz´alez, F.A.: Classifying phishing urls using recurrent neural networks. In: 2017 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2017). https://doi.org/10.1109/ECRIME.2017.7945048

  12. Feng, J., Zou, L., Ye, O., Han, J.: Web2vec: phishing webpage detection method based on multidimensional features driven by deep learning. IEEE Access 8, 221214–221224 (2020). https://doi.org/10.1109/access.2020.3043188

    Article  Google Scholar 

  13. Opara, C., Chen, Y., wei, B.: Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics. arXiv (2020). https://doi.org/10.48550/ARXIV.2011.04412. https://arxiv.org/abs/2011.04412

  14. Aassal, A.E., Baki, S., Das, A., Verma, R.M.: An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8, 22170–22192 (2020). https://doi.org/10.1109/access.2020.2969780

    Article  Google Scholar 

  15. Opara, C., Wei, B., Chen, Y.: Htmlphish: Enabling phishing web page detection by applying deep learning techniques on html analysis. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207707

  16. Shirazi, H., Bezawada, B., Ray, I., Anderson, C.: Adversarial sampling attacks against phishing detection. In: Foley, S.N. (ed.) Data and Applications Security and Privacy XXXIII, pp. 83–101. Springer, Cham (2019)

    Chapter  Google Scholar 

  17. Ariyadasa, S., Fernando, S., Fernando, S.: Combining long-term recurrent convolutional and graph convolutional networks to detect phishing sites using URL and HTML. IEEE Access 10, 82355–82375 (2022). https://doi.org/10.1109/access.2022.3196018

    Article  Google Scholar 

  18. Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL Detection using Machine Learning: A Survey. arXiv (2017). https://doi.org/10.48550/ARXIV.1701.07179. https://arxiv.org/abs/1701.07179

  19. Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z

    Article  Google Scholar 

  20. El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017). https://doi.org/10.1093/comjnl/bxx035

    Article  Google Scholar 

  21. Buber, E., Demir, O., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017). https://doi.org/10.1109/IDAP.2017.8090317

  22. Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019). https://doi.org/10.1109/access.2019.2892066

    Article  Google Scholar 

  23. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019). https://doi.org/10.1016/j.eswa.2018.09.029

    Article  Google Scholar 

  24. Wang, W., Zhang, F., Luo, X., Zhang, S.: PDRCNN: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw. 2019, 1–15 (2019). https://doi.org/10.1155/2019/2595794

    Article  Google Scholar 

  25. Sameen, M., Han, K., Hwang, S.O.: PhishHaven—an efficient real-time AI phishing URLs detection system. IEEE Access 8, 83425–83443 (2020). https://doi.org/10.1109/access.2020.2991403

    Article  Google Scholar 

  26. Chen, W., Zhang, W., Su, Y.: Phishing detection research based on lstm recurrent neural network. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds.) Data Science, pp. 638–645. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2203-752

    Chapter  Google Scholar 

  27. Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish: simulating malicious ai. In: 2018 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2018)

  28. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  29. Chauhan, N.K., Singh, K.: A review on conventional machine learning vs deep learning. In: 2018 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 347–352 (2018). https://doi.org/10.1109/GUCON.2018.8675097

  30. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2018)

  31. Chatterjee, M., Namin, A.S.: Deep Reinforcement Learning for Detecting Malicious Websites. arXiv (2019). https://doi.org/10.48550/ARXIV.1905.09207. https://arxiv.org/abs/1905.09207

  32. Alabdan, R.: Phishing attacks survey: types, vectors, and technical approaches. Future Internet 12(10), 168 (2020). https://doi.org/10.3390/fi12100168

    Article  Google Scholar 

  33. Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish : Simulating malicious ai. (2018)

  34. Verma, R.M., Zeng, V., Faridi, H.: Data quality for security challenges. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3319535.3363267

  35. Butnaru, A., Mylonas, A., Pitropakis, N.: Towards lightweight URL-based phishing detection. Future Internet 13(6), 154 (2021). https://doi.org/10.3390/fi13060154

    Article  Google Scholar 

  36. Ariyadasa, S., Fernando, S., Fernando, S.: PhishRepo: a seamless collection of phishing data to fill a research gap in the phishing domain. Int. J. Adv. Comput. Sci. Appl. (2022). https://doi.org/10.14569/ijacsa.2022.0130597

    Article  Google Scholar 

  37. Wu, C.-Y., Kuo, C.-C., Yang, C.-S.: A phishing detection system based on machine learning. In: 2019 International Conference on Intelligent Computing and Its Emerging Applications (ICEA), pp. 28–32 (2019). https://doi.org/10.1109/ICEA.2019.8858325

  38. Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saudi Univ. Comput. Inf. Sci. 34(2), 232–247 (2022). https://doi.org/10.1016/j.jksuci.2019.12.005

    Article  Google Scholar 

  39. Ariyadasa, S., Fernando, S., Fernando, S.: Detecting phishing attacks using a combined model of LSTM and CNN. Int. J. Adv. Appl. Sci. 7(7), 56–67 (2020). https://doi.org/10.21833/ijaas.2020.07.007

    Article  Google Scholar 

  40. Franc¸ois-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018). https://doi.org/10.1561/2200000071

    Article  Google Scholar 

  41. Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv (2020). https://doi.org/10.48550/ARXIV.2005.01643. https://arxiv.org/abs/2005.01643

  42. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  43. Tuan Nguyen, L.A., To, B.L., Nguyen, H.K., Nguyen, M.H.: An efficient approach for phishing detection using single-layer neural network. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014), pp. 435–440 (2014). https://doi.org/10.1109/ATC.2014.7043427

  44. Ariyadasa, S., Fernando, S., Fernando, S.: Phishing websites dataset. Mendeley (2021). https://doi.org/10.17632/N96NCSR5G4.1

    Article  Google Scholar 

Download references

Acknowledgements

There is no any third person or organisation to acknowledge.

Funding

The authors declare that the research does not use any funding sources for the work. There are no any funding sources to disclose.

Author information

Authors and Affiliations

Authors

Contributions

The main manuscript and all its components were authored by SA. In addition to writing, SA took responsibility for designing, developing, and implementing the experiments detailed in the manuscript. The supervision, review, and valuable suggestions for further improvements in design and implementation were provided by SF and SF.

Corresponding author

Correspondence to Subhash Ariyadasa.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ariyadasa, S., Fernando, S. & Fernando, S. SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks. Int. J. Inf. Secur. 23, 1055–1076 (2024). https://doi.org/10.1007/s10207-023-00778-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00778-9

Keywords

Navigation