Skip to main content

Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection

  • Conference paper
  • First Online:
13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020) (CISIS 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1267))

Abstract

Phishing is one of the most widespread attacks based on social engineering. The detection of Phishing using Machine Learning approaches is more robust than the blacklist-based ones, which need regular reports and updates. However, the datasets currently used for training the Supervised Learning approaches have some drawbacks. These datasets only have the landing page of legitimate domains and they do not include the login forms from the websites, which is the most common situation in a real case of Phishing. This makes the performance of Machine Learning-based models to drop, especially when they are tested using login pages.

In this paper, we demonstrate that a machine learning model trained with datasets collected some years ago, could have high performance when tested with the same outdated datasets, but its performance decreases notably with current datasets, using in both cases the same features. We also demonstrate that, among the commonly applied machine learning algorithms, SVM is the most resilient to the new strategies used by the current phishing attacks.

To prove these statements, we created a new dataset, Phishing Index Login URL dataset (PILU-60K), containing 60K URLs from legitimate index and login URLs, together with Phishing samples. We evaluated several machine learning methods with the known datasets PWD2016, Ebbu2017 and also with two subsets of PILU, PIU-40K and PLU-40K, which contains only index pages and only login pages respectively, showing that the accuracy decreases remarkably. We also found that Random Forest is the recommended approach among all the evaluated methods with the newly created dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://safebrowsing.google.com/.

  2. 2.

    https://www.phishtank.com/.

  3. 3.

    https://bit.ly/2OJDYBS.

  4. 4.

    https://www.cs.waikato.ac.nz/ml/weka/.

  5. 5.

    https://www.whois.net/.

  6. 6.

    http://eprints.hud.ac.uk/24330/9/Mohammad14JulyDS.

  7. 7.

    https://www.quantcast.com/products/measure-audience-insights/.

  8. 8.

    https://selenium.dev/projects/.

  9. 9.

    A verified URL on Phishtank needs five people to visit the URL and vote to be real phishing. This increases the reliability of these samples.

  10. 10.

    Dataset available at: http://gvis.unileon.es/dataset/pilu-60k/.

  11. 11.

    https://scikit-learn.org/.

References

  1. Ferreira, A., Teles, S.: Persuasion: how phishing emails can influence users and bypass security measures. Int. J. Hum. Comput. Stud. 125, 19–31 (2019)

    Article  Google Scholar 

  2. Patel, P., Sarno, D.M., Lewis, J.E., Shoss, M., Neider, M.B., Bohil, C.J.: Perceptual representation of spam and phishing emails. Appl. Cogn. Psychol. 33, 1296–1304 (2019)

    Article  Google Scholar 

  3. Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2019)

    Google Scholar 

  4. Chanti, S., Chithralekha, T.: Classification of anti-phishing solutions. SN Comput. Sci. 1(1), 11 (2020)

    Article  Google Scholar 

  5. Halgas, L., Agrafiotis, I., Nurse, J.R.C.: Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs) (2019)

    Google Scholar 

  6. Rao, R.S., Pais, A.R.: Jail-Phish: an improved search engine based phishing detection system. Comput. Secur. 83, 246–267 (2019)

    Article  Google Scholar 

  7. Adebowale, M.A., Lwin, K.T., Sánchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)

    Article  Google Scholar 

  8. Anti-Phishing Working Group. Phishing Activity Trends Report 3Q (2017)

    Google Scholar 

  9. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)

    Article  Google Scholar 

  10. Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. Proc. ACM Conf. Comput. Commun. Secur. 4, 51–59 (2008)

    Google Scholar 

  11. Prakash, P., Kumar, M., Rao Kompella, R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. Proceedings - IEEE INFOCOM (2010)

    Google Scholar 

  12. Jain, A.K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using auto-updated white-list. Eurasip J. Inf. Secur. 9, 46 (2016)

    Google Scholar 

  13. Moore, T., Clayton, R.: Examining the impact of website take-down on phishing. ACM Int. Conf. Proc. Ser. 269, 1–13 (2007)

    Google Scholar 

  14. Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)

    Article  Google Scholar 

  15. Shirazi, H., Bezawada, B., Ray, I.: Know thy domain name: Unbiased phishing detection using domain name based features. In: Proceedings of ACM Symposium on Access Control Models and Technologies, SACMAT, pp. 69–75 (2018)

    Google Scholar 

  16. Buber, E., Diri, B., Sahingoz, O.K.: NLP Based Phishing Attack Detection from URLs. Springer, Cham (2018)

    Google Scholar 

  17. Yue, Z., Hong, J., Cranor, L.: CANTINA: a content-based approach to detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 1–28 (2007)

    Google Scholar 

  18. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)

    Google Scholar 

  19. Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. (2018)

    Google Scholar 

  20. Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Fut. Generat. Comput. Syst. 94, 27–39 (2019)

    Article  Google Scholar 

  21. Chiew, K.L., Chang, E.H., Lin Tan, C., Abdullah, J., Yong, K.S.C.: Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 7(4.31), 7–14 (2018)

    Google Scholar 

  22. Yuan, H., Yang, Z., Chen, X., Li, Y., Liu, W.: URL2Vec: URL modeling with character embeddings for fast and accurate phishing website detection. 17th IEEE International Conference on Ubiquitous Computing and Communications, pp. 265–272, (2019)

    Google Scholar 

Download references

Acknowledgement

This research was funded by the framework agreement between the University of León and INCIBE (Spanish National Cybersecurity Institute) under Addendum 01.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to M. Sánchez-Paniagua , E. Fidalgo , V. González-Castro or E. Alegre .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sánchez-Paniagua, M., Fidalgo, E., González-Castro, V., Alegre, E. (2021). Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds) 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020). CISIS 2019. Advances in Intelligent Systems and Computing, vol 1267. Springer, Cham. https://doi.org/10.1007/978-3-030-57805-3_9

Download citation

Publish with us

Policies and ethics