Abstract
Phishing is one of the most widespread attacks based on social engineering. The detection of Phishing using Machine Learning approaches is more robust than the blacklist-based ones, which need regular reports and updates. However, the datasets currently used for training the Supervised Learning approaches have some drawbacks. These datasets only have the landing page of legitimate domains and they do not include the login forms from the websites, which is the most common situation in a real case of Phishing. This makes the performance of Machine Learning-based models to drop, especially when they are tested using login pages.
In this paper, we demonstrate that a machine learning model trained with datasets collected some years ago, could have high performance when tested with the same outdated datasets, but its performance decreases notably with current datasets, using in both cases the same features. We also demonstrate that, among the commonly applied machine learning algorithms, SVM is the most resilient to the new strategies used by the current phishing attacks.
To prove these statements, we created a new dataset, Phishing Index Login URL dataset (PILU-60K), containing 60K URLs from legitimate index and login URLs, together with Phishing samples. We evaluated several machine learning methods with the known datasets PWD2016, Ebbu2017 and also with two subsets of PILU, PIU-40K and PLU-40K, which contains only index pages and only login pages respectively, showing that the accuracy decreases remarkably. We also found that Random Forest is the recommended approach among all the evaluated methods with the newly created dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
A verified URL on Phishtank needs five people to visit the URL and vote to be real phishing. This increases the reliability of these samples.
- 10.
Dataset available at: http://gvis.unileon.es/dataset/pilu-60k/.
- 11.
References
Ferreira, A., Teles, S.: Persuasion: how phishing emails can influence users and bypass security measures. Int. J. Hum. Comput. Stud. 125, 19–31 (2019)
Patel, P., Sarno, D.M., Lewis, J.E., Shoss, M., Neider, M.B., Bohil, C.J.: Perceptual representation of spam and phishing emails. Appl. Cogn. Psychol. 33, 1296–1304 (2019)
Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2019)
Chanti, S., Chithralekha, T.: Classification of anti-phishing solutions. SN Comput. Sci. 1(1), 11 (2020)
Halgas, L., Agrafiotis, I., Nurse, J.R.C.: Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs) (2019)
Rao, R.S., Pais, A.R.: Jail-Phish: an improved search engine based phishing detection system. Comput. Secur. 83, 246–267 (2019)
Adebowale, M.A., Lwin, K.T., Sánchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)
Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2017)
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. Proc. ACM Conf. Comput. Commun. Secur. 4, 51–59 (2008)
Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proceedings - IEEE INFOCOM (2010)
Jain, A.K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using auto-updated white-list. Eurasip J. Inf. Secur. 9, 46 (2016)
Moore, T., Clayton, R.: Examining the impact of website take-down on phishing. ACM Int. Conf. Proc. Ser. 269, 1–13 (2007)
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Shirazi, H., Bezawada, B., Ray, I.: Know thy domain name: Unbiased phishing detection using domain name based features. In: Proceedings of ACM Symposium on Access Control Models and Technologies, SACMAT, pp. 69–75 (2018)
Buber, E., Diri, B., Sahingoz, O.K.: NLP Based Phishing Attack Detection from URLs. Springer, Cham (2018)
Yue, Z., Hong, J., Cranor, L.: CANTINA: a content-based approach to detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 1–28 (2007)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. (2018)
Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Fut. Generat. Comput. Syst. 94, 27–39 (2019)
Chiew, K.L., Chang, E.H., Lin Tan, C., Abdullah, J., Yong, K.S.C.: Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 7(4.31), 7–14 (2018)
Yuan, H., Yang, Z., Chen, X., Li, Y., Liu, W.: URL2Vec: URL modeling with character embeddings for fast and accurate phishing website detection. 17th IEEE International Conference on Ubiquitous Computing and Communications, pp. 265–272, (2019)
Acknowledgement
This research was funded by the framework agreement between the University of León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sánchez-Paniagua, M., Fidalgo, E., González-Castro, V., Alegre, E. (2021). Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds) 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020). CISIS 2019. Advances in Intelligent Systems and Computing, vol 1267. Springer, Cham. https://doi.org/10.1007/978-3-030-57805-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-57805-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57804-6
Online ISBN: 978-3-030-57805-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)