Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection

Sánchez-Paniagua, M.; Fidalgo, E.; González-Castro, V.; Alegre, E.

doi:10.1007/978-3-030-57805-3_9

M. Sánchez-Paniagua^20,21,
E. Fidalgo^20,21,
V. González-Castro^20,21 &
…
E. Alegre^20,21

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1267))

Included in the following conference series:

Computational Intelligence in Security for Information Systems Conference

945 Accesses
7 Citations
6 Altmetric

Abstract

Phishing is one of the most widespread attacks based on social engineering. The detection of Phishing using Machine Learning approaches is more robust than the blacklist-based ones, which need regular reports and updates. However, the datasets currently used for training the Supervised Learning approaches have some drawbacks. These datasets only have the landing page of legitimate domains and they do not include the login forms from the websites, which is the most common situation in a real case of Phishing. This makes the performance of Machine Learning-based models to drop, especially when they are tested using login pages.

In this paper, we demonstrate that a machine learning model trained with datasets collected some years ago, could have high performance when tested with the same outdated datasets, but its performance decreases notably with current datasets, using in both cases the same features. We also demonstrate that, among the commonly applied machine learning algorithms, SVM is the most resilient to the new strategies used by the current phishing attacks.

To prove these statements, we created a new dataset, Phishing Index Login URL dataset (PILU-60K), containing 60K URLs from legitimate index and login URLs, together with Phishing samples. We evaluated several machine learning methods with the known datasets PWD2016, Ebbu2017 and also with two subsets of PILU, PIU-40K and PLU-40K, which contains only index pages and only login pages respectively, showing that the accuracy decreases remarkably. We also found that Random Forest is the recommended approach among all the evaluated methods with the newly created dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://safebrowsing.google.com/.
2.
https://www.phishtank.com/.
3.
https://bit.ly/2OJDYBS.
4.
https://www.cs.waikato.ac.nz/ml/weka/.
5.
https://www.whois.net/.
6.
http://eprints.hud.ac.uk/24330/9/Mohammad14JulyDS.
7.
https://www.quantcast.com/products/measure-audience-insights/.
8.
https://selenium.dev/projects/.
9.
A verified URL on Phishtank needs five people to visit the URL and vote to be real phishing. This increases the reliability of these samples.
10.
Dataset available at: http://gvis.unileon.es/dataset/pilu-60k/.
11.
https://scikit-learn.org/.

References

Ferreira, A., Teles, S.: Persuasion: how phishing emails can influence users and bypass security measures. Int. J. Hum. Comput. Stud. 125, 19–31 (2019)
Article Google Scholar
Patel, P., Sarno, D.M., Lewis, J.E., Shoss, M., Neider, M.B., Bohil, C.J.: Perceptual representation of spam and phishing emails. Appl. Cogn. Psychol. 33, 1296–1304 (2019)
Article Google Scholar
Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2019)
Google Scholar
Chanti, S., Chithralekha, T.: Classification of anti-phishing solutions. SN Comput. Sci. 1(1), 11 (2020)
Article Google Scholar
Halgas, L., Agrafiotis, I., Nurse, J.R.C.: Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs) (2019)
Google Scholar
Rao, R.S., Pais, A.R.: Jail-Phish: an improved search engine based phishing detection system. Comput. Secur. 83, 246–267 (2019)
Article Google Scholar
Adebowale, M.A., Lwin, K.T., Sánchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)
Article Google Scholar
Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2017)
Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Article Google Scholar
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. Proc. ACM Conf. Comput. Commun. Secur. 4, 51–59 (2008)
Google Scholar
Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proceedings - IEEE INFOCOM (2010)
Google Scholar
Jain, A.K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using auto-updated white-list. Eurasip J. Inf. Secur. 9, 46 (2016)
Google Scholar
Moore, T., Clayton, R.: Examining the impact of website take-down on phishing. ACM Int. Conf. Proc. Ser. 269, 1–13 (2007)
Google Scholar
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Article Google Scholar
Shirazi, H., Bezawada, B., Ray, I.: Know thy domain name: Unbiased phishing detection using domain name based features. In: Proceedings of ACM Symposium on Access Control Models and Technologies, SACMAT, pp. 69–75 (2018)
Google Scholar
Buber, E., Diri, B., Sahingoz, O.K.: NLP Based Phishing Attack Detection from URLs. Springer, Cham (2018)
Google Scholar
Yue, Z., Hong, J., Cranor, L.: CANTINA: a content-based approach to detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 1–28 (2007)
Google Scholar
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)
Google Scholar
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. (2018)
Google Scholar
Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Fut. Generat. Comput. Syst. 94, 27–39 (2019)
Article Google Scholar
Chiew, K.L., Chang, E.H., Lin Tan, C., Abdullah, J., Yong, K.S.C.: Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 7(4.31), 7–14 (2018)
Google Scholar
Yuan, H., Yang, Z., Chen, X., Li, Y., Liu, W.: URL2Vec: URL modeling with character embeddings for fast and accurate phishing website detection. 17th IEEE International Conference on Ubiquitous Computing and Communications, pp. 265–272, (2019)
Google Scholar

Download references

Acknowledgement

This research was funded by the framework agreement between the University of León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01.

Author information

Authors and Affiliations

Department of Electrical, Systems and Automatics Engineering, University of León, León, Spain
M. Sánchez-Paniagua, E. Fidalgo, V. González-Castro & E. Alegre
Researcher at INCIBE (Spanish National Institute of Cybersecurity), León, Spain
M. Sánchez-Paniagua, E. Fidalgo, V. González-Castro & E. Alegre

Authors

M. Sánchez-Paniagua
View author publications
You can also search for this author in PubMed Google Scholar
E. Fidalgo
View author publications
You can also search for this author in PubMed Google Scholar
V. González-Castro
View author publications
You can also search for this author in PubMed Google Scholar
E. Alegre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to M. Sánchez-Paniagua , E. Fidalgo , V. González-Castro or E. Alegre .

Editor information

Editors and Affiliations

Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
Álvaro Herrero
Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
Carlos Cambra
Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain
Daniel Urda
Technological Institute of Castilla y León, Burgos, Spain
Javier Sedano
Department of Industrial Engineering, University of A Coruña, La Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Paniagua, M., Fidalgo, E., González-Castro, V., Alegre, E. (2021). Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds) 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020). CISIS 2019. Advances in Intelligent Systems and Computing, vol 1267. Springer, Cham. https://doi.org/10.1007/978-3-030-57805-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-57805-3_9
Published: 28 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57804-6
Online ISBN: 978-3-030-57805-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics