Abstract
Phishing, a well-known cyberattack that cannot be completely eradicated from the Internet, has increased dramatically since the COVID-19 pandemic. Despite previous efforts to reduce this prevalent Internet threat, constantly changing attacks make phishing detection a difficult task. The lack of continuous learning support provided by existing solutions and the lack of a systematic knowledge acquisition process make its detection more difficult. SmartiPhish is introduced in this context as the first anti-phishing solution with integrated continuous learning support with an innovative knowledge acquisition process. SmartiPhish combines deep learning and reinforcement learning to have a successful phishing detection solution. The deep learning model predicts a phishing probability for a given web page based on the URL and HTML content, and the probability is then passed to a reinforcement learning environment to make a decision based on the popularity of the web page and prior knowledge of it. SmartiPhish has a detection accuracy of 96.40% and a detection time of 4.3 s. SmartiPhish performs well in an imbalanced environment, and zero-day attack detection is also interesting. Furthermore, SmartiPhish demonstrated a 5.65% performance improvement in just six weeks, in contrast to the existing anti-phishing tools’ declining performance trend over time.
Similar content being viewed by others
Availability of data
The primary dataset supporting the conclusions of this article is available in the Mendeley Data repository, n96ncsr5g4/1, https://data.mendeley.com/datasets/n96ncsr5g4/1. The specific data supporting some of the findings of this study are available from the corresponding author on request.
Availability of code and materials
The code required to implement most of the research manuscript’s concepts and findings can be accessed in the GitHub repository, which can be found at https://github.com/sna-hm/SmartiPhish. The practical application of the proposed solution in this manuscript, particularly in web surfing tasks within a natural web environment, has been demonstrated through a publicly shared video (https://youtu.be/MddiKIFvXM).
References
Chiew, K.L., Yong, K.S.C., Tan, C.L.: A survey of phishing attacks: their types, vectors and technical approaches. Expert Syst. Appl. 106, 1–20 (2018). https://doi.org/10.1016/j.eswa.2018.03.050
Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Guizani, M.: Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun. Surv. Tutor. 19(4), 2797–2819 (2017). https://doi.org/10.1109/comst.2017.2752087
European Union Agency for Cybersecurity: Enisa threat landscape report 2018: 15 top cyber threats and trends. Technical report (2019). https://doi.org/10.2824/622757
APWG: Phishing activity trends report: 4th quarter 2021. Technical report, Anti-Phishing Working Group (2022)
Huang, H., Zhong, S., Tan, J.: Browser-side countermeasures for deceptive phishing attack. In: 2009 Fifth International Conference on Information Assurance and Security, vol. 1, pp. 352–355 (2009). https://doi.org/10.1109/IAS.2009.12
Yu, W.D., Nargundkar, S., Tiruthani, N.: A phishing vulnerability analysis of web based systems. In: 2008 IEEE Symposium on Computers and Communications, pp. 326–331 (2008). https://doi.org/10.1109/ISCC.2008.4625681
Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. (2021). https://doi.org/10.3389/fcomp.2021.563060
Oest, A., Zhang, P., Wardman, B., Nunes, E., Burgis, J., Zand, A., Thomas, K., Doup´e, A., Ahn, G.-J.: Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale. In: 29th {USENIX} Security Symposium ({USENIX} Security 20) (2020)
Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Futur. Gener. Comput. Syst. 94, 27–39 (2019). https://doi.org/10.1016/j.future.2018.11.004
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013). https://doi.org/10.1109/surv.2013.032213.00009
Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., Gonz´alez, F.A.: Classifying phishing urls using recurrent neural networks. In: 2017 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2017). https://doi.org/10.1109/ECRIME.2017.7945048
Feng, J., Zou, L., Ye, O., Han, J.: Web2vec: phishing webpage detection method based on multidimensional features driven by deep learning. IEEE Access 8, 221214–221224 (2020). https://doi.org/10.1109/access.2020.3043188
Opara, C., Chen, Y., wei, B.: Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics. arXiv (2020). https://doi.org/10.48550/ARXIV.2011.04412. https://arxiv.org/abs/2011.04412
Aassal, A.E., Baki, S., Das, A., Verma, R.M.: An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8, 22170–22192 (2020). https://doi.org/10.1109/access.2020.2969780
Opara, C., Wei, B., Chen, Y.: Htmlphish: Enabling phishing web page detection by applying deep learning techniques on html analysis. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207707
Shirazi, H., Bezawada, B., Ray, I., Anderson, C.: Adversarial sampling attacks against phishing detection. In: Foley, S.N. (ed.) Data and Applications Security and Privacy XXXIII, pp. 83–101. Springer, Cham (2019)
Ariyadasa, S., Fernando, S., Fernando, S.: Combining long-term recurrent convolutional and graph convolutional networks to detect phishing sites using URL and HTML. IEEE Access 10, 82355–82375 (2022). https://doi.org/10.1109/access.2022.3196018
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL Detection using Machine Learning: A Survey. arXiv (2017). https://doi.org/10.48550/ARXIV.1701.07179. https://arxiv.org/abs/1701.07179
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017). https://doi.org/10.1093/comjnl/bxx035
Buber, E., Demir, O., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017). https://doi.org/10.1109/IDAP.2017.8090317
Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019). https://doi.org/10.1109/access.2019.2892066
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019). https://doi.org/10.1016/j.eswa.2018.09.029
Wang, W., Zhang, F., Luo, X., Zhang, S.: PDRCNN: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw. 2019, 1–15 (2019). https://doi.org/10.1155/2019/2595794
Sameen, M., Han, K., Hwang, S.O.: PhishHaven—an efficient real-time AI phishing URLs detection system. IEEE Access 8, 83425–83443 (2020). https://doi.org/10.1109/access.2020.2991403
Chen, W., Zhang, W., Su, Y.: Phishing detection research based on lstm recurrent neural network. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds.) Data Science, pp. 638–645. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2203-752
Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish: simulating malicious ai. In: 2018 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8 (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
Chauhan, N.K., Singh, K.: A review on conventional machine learning vs deep learning. In: 2018 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 347–352 (2018). https://doi.org/10.1109/GUCON.2018.8675097
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2018)
Chatterjee, M., Namin, A.S.: Deep Reinforcement Learning for Detecting Malicious Websites. arXiv (2019). https://doi.org/10.48550/ARXIV.1905.09207. https://arxiv.org/abs/1905.09207
Alabdan, R.: Phishing attacks survey: types, vectors, and technical approaches. Future Internet 12(10), 168 (2020). https://doi.org/10.3390/fi12100168
Bahnsen, A.C., Torroledo, I., Camacho, L.D., Villegas, S.: Deepphish : Simulating malicious ai. (2018)
Verma, R.M., Zeng, V., Faridi, H.: Data quality for security challenges. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3319535.3363267
Butnaru, A., Mylonas, A., Pitropakis, N.: Towards lightweight URL-based phishing detection. Future Internet 13(6), 154 (2021). https://doi.org/10.3390/fi13060154
Ariyadasa, S., Fernando, S., Fernando, S.: PhishRepo: a seamless collection of phishing data to fill a research gap in the phishing domain. Int. J. Adv. Comput. Sci. Appl. (2022). https://doi.org/10.14569/ijacsa.2022.0130597
Wu, C.-Y., Kuo, C.-C., Yang, C.-S.: A phishing detection system based on machine learning. In: 2019 International Conference on Intelligent Computing and Its Emerging Applications (ICEA), pp. 28–32 (2019). https://doi.org/10.1109/ICEA.2019.8858325
Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saudi Univ. Comput. Inf. Sci. 34(2), 232–247 (2022). https://doi.org/10.1016/j.jksuci.2019.12.005
Ariyadasa, S., Fernando, S., Fernando, S.: Detecting phishing attacks using a combined model of LSTM and CNN. Int. J. Adv. Appl. Sci. 7(7), 56–67 (2020). https://doi.org/10.21833/ijaas.2020.07.007
Franc¸ois-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018). https://doi.org/10.1561/2200000071
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv (2020). https://doi.org/10.48550/ARXIV.2005.01643. https://arxiv.org/abs/2005.01643
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Tuan Nguyen, L.A., To, B.L., Nguyen, H.K., Nguyen, M.H.: An efficient approach for phishing detection using single-layer neural network. In: 2014 International Conference on Advanced Technologies for Communications (ATC 2014), pp. 435–440 (2014). https://doi.org/10.1109/ATC.2014.7043427
Ariyadasa, S., Fernando, S., Fernando, S.: Phishing websites dataset. Mendeley (2021). https://doi.org/10.17632/N96NCSR5G4.1
Acknowledgements
There is no any third person or organisation to acknowledge.
Funding
The authors declare that the research does not use any funding sources for the work. There are no any funding sources to disclose.
Author information
Authors and Affiliations
Contributions
The main manuscript and all its components were authored by SA. In addition to writing, SA took responsibility for designing, developing, and implementing the experiments detailed in the manuscript. The supervision, review, and valuable suggestions for further improvements in design and implementation were provided by SF and SF.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ariyadasa, S., Fernando, S. & Fernando, S. SmartiPhish: a reinforcement learning-based intelligent anti-phishing solution to detect spoofed website attacks. Int. J. Inf. Secur. 23, 1055–1076 (2024). https://doi.org/10.1007/s10207-023-00778-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00778-9