Skip to main content

Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis)

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12465)

Abstract

Phishing attacks are the most common type of cyber-attacks used to obtain sensitive information and have been affecting individuals as well as organizations across the globe. Various techniques have been proposed to identify the phishing attacks specifically, deployment of machine intelligence in recent years. However, the algorithms and discriminating factors used in these techniques are very diverse in existing works. In this study, we present a comprehensive analysis of various machine learning algorithms to evaluate their performances over multiple datasets. We further investigate the most significant features within multiple datasets and compare the classification performance with the reduced dimensional datasets. The statistical results indicate that random forest and artificial neural network outperform other classification algorithms, achieving over 97% accuracy using the identified features.

Keywords

  • Phishing attacks
  • Cyber security
  • Phishing emails
  • Information security
  • Security and privacy
  • Phishing classification
  • Phishing websites detection

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-60796-8_26
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-60796-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. What is phishing | Attack techniques & scam examples | Imperva, Imperva (2016). https://www.imperva.com/learn/application-security/phishing-attack-scam/. Accessed 12 June 2019

  2. Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Conference on Email and Anti-Spam (2009). https://doi.org/10.1184/R1/6469805.v1

  3. Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. (2017). https://doi.org/10.1155/2017/5421046

    CrossRef  Google Scholar 

  4. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992). https://doi.org/10.1145/130385.130401

  5. Quinlan, J.R.: “Induction of decision trees”, readings in machine learning. Mach. Learn. 1, 81–106 (1986). https://doi.org/10.1007/BF00116251

    CrossRef  Google Scholar 

  6. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    CrossRef  MATH  Google Scholar 

  7. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995). https://arxiv.org/abs/1302.4964.

  8. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992). https://doi.org/10.1080/00031305.1992.10475879

    MathSciNet  CrossRef  Google Scholar 

  9. Rosenblatt, F.F.: Princples of neurodynamics. Perceptions and the theory of brain mechanisms. Am. J. Psychol. (1963). https://doi.org/10.2307/1419730

  10. Pearson, K.F.R.S.: On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos. Mag. J. Sci. 2, 559–572 (1901). https://doi.org/10.1080/14786440109462720

    CrossRef  Google Scholar 

  11. Khan, W., Ansell, D., Kuru, K., Bilal, M.: Flight guardian: autonomous flight safety improvement by monitoring aircraft cockpit instruments. J. Aerospace Inf. Syst. AIAA 15, 203–214 (2018)

    Google Scholar 

  12. Khan, W., Kuru, K.: An intelligent system for spoken term detection that uses belief combination. IEEE Intell. Syst. 32, 70–79 (2017)

    CrossRef  Google Scholar 

  13. Khan, W., Badii, A.: Pathological gait abnormality detection and segmentation by processing the hip joints motion data to support mobile gait rehabilitation. J. Res. Med. Sci. 07, 1–9 (2019)

    CrossRef  Google Scholar 

  14. Khan, W., Hussain, A., Khan, B., Shamsa, T.B., Nawaz, R.: Novel framework for outdoor mobility assistance and auditory display for visually impaired people. In: 12th International Conference on the Developments in eSystems Engineering (DeSE2019: Robotics, Sensors, Data Science and Industry 4.0.) (2019)

    Google Scholar 

  15. O’Shea, J., Crockett, K., Khan, W., Kindynis, P., Antoniades, A., Boultadakis, G.: Intelligent deception detection through machine based interviewing. In: International Joint Conference on Neural Networks (IJCNN) (2018)

    Google Scholar 

  16. Kuru, K., Khan, W.: Novel hybrid object-based non-parametric clustering approach for grouping similar objects in specific visual domains. Appl. Soft Comput. 62, 667–701 (2018)

    CrossRef  Google Scholar 

  17. Dilek, S., Çakır, H., Aydın, M.: Applications of artificial intelligence techniques to combating cyber-crimes: a Review (2015). https://arxiv.org/abs/1502.03552

  18. Qadir, H., Khalid, O., Khan, M.U., Khan, A.U., Nawaz, R.: An optimal ride sharing recommendation framework for carpooling services. IEEE Access 06, 62296–62313 (2018). https://doi.org/10.1109/ACCESS.2018.2876595

    CrossRef  Google Scholar 

  19. Davis, J.: Phishing Attacks on the Rise, 25% Increase in Threats Evading Security, HealthITSecurity (2019). https://healthitsecurity.com/news/phishing-attacks-on-the-rise-25-increase-in-threats-evading-security

  20. Ibrahim, D., Hadi, A.: Phishing websites prediction using classification techniques. In: International Conference on New Trends in Computing Sciences (ICTCS) (2017). https://doi.org/10.1109/ictcs.2017.38

  21. Mohammad, R.M., McCluskey, T.L., Thabtah, F.: UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science (2012). https://archive.ics.uci.edu/ml/datasets/phishing+websites. Accessed 16 June 2019

  22. Zhang, N., Yuan, Y.: Phishing detection using neural network (2012). https://cs229.stanford.edu/proj2012/ZhangYuan-PhishingDetectionUsingNeuralNetwork.pdf

  23. Metrics and scoring: quantifying the quality of predictions — scikit-learn 0.22.1 documentation, Scikit-learn.org. https://scikit-learn.org/stable/modules/model_evaluation.html

  24. Mohammad, R., McCluskey, L., Thabtah, F.: Intelligent rule-based phishing websites classification. IET Inf. Secur. 8(3), 153–160 (2014). https://doi.org/10.1049/iet-ifs.2013.0202

    CrossRef  Google Scholar 

  25. Karnik, R., Bhandari, D.G.M.: Support vector machine based malware and phishing website detection (2016). https://pdfs.semanticscholar.org/ffea/603ec9f33931c9de630ba1a6ac71924f1539.pdf?_ga=2.226066713.262761491.1579621617-1102774226.1578838444

  26. Babagoli, M., Aghababa, M.P., Solouk, V.: Heuristic nonlinear regression strategy for detecting phishing websites. Soft. Comput. 23(12), 4315–4327 (2018). https://doi.org/10.1007/s00500-018-3084-2

    CrossRef  Google Scholar 

  27. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from urls (2019). https://doi.org/10.1016/j.eswa.2018.09.029

  28. Tahir, M.A.U.H., Asghar, S., Zafar, A., Gillani, S.: A hybrid model to detect phishing sites using supervised learning algorithms (2016). https://doi.org/10.1109/CSCI.2016.0214

  29. Chang, H.L., Dong, H.K., LEE, L.J.: Heuristic based approach for phishing site detection using URL features. In: Third International Conference on Advances in Computing, Electronics and Electrical Technology - CEET (2015). https://doi.org/10.15224/978-1-63248-056-9-84

  30. Tan, C.L.: Phishing Dataset for Machine Learning: Feature Evaluation, Mendeley Data, v1 (2018). https://doi.org/10.17632/h3cgnj8hft.1. Accessed 16 June 2019

  31. Abdelhamid, N.: UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science (2016). https://archive.ics.uci.edu/ml/datasets/Website+Phishing. Accessed 16 June 2019

  32. Scikit-learn: machine learning in Python — scikit-learn 0.22.1 documentation, Scikit-learn.org. https://scikit-learn.org/stable/

  33. Home - Keras Documentation, Keras.io. https://keras.io/

  34. NumPy. https://numpy.org/

  35. Python Data Analysis Library, Pandas.pydata.org. https://pandas.pydata.org/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wasiq Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Khan, S.A., Khan, W., Hussain, A. (2020). Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis). In: Huang, DS., Premaratne, P. (eds) Intelligent Computing Methodologies. ICIC 2020. Lecture Notes in Computer Science(), vol 12465. Springer, Cham. https://doi.org/10.1007/978-3-030-60796-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60796-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60795-1

  • Online ISBN: 978-3-030-60796-8

  • eBook Packages: Computer ScienceComputer Science (R0)