Implementation of Machine Learning and Data Mining to Improve Cybersecurity and Limit Vulnerabilities to Cyber Attacks

  • Mohamed AlloghaniEmail author
  • Dhiya Al-Jumeily
  • Abir Hussain
  • Jamila Mustafina
  • Thar Baker
  • Ahmed J. Aljaaf
Part of the Studies in Computational Intelligence book series (SCI, volume 855)


Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors’ resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.


Data mining Machine learning Cybersecurity Phishing websites 



The challenges of accessing reliable cyber security dataset are well documented and a common one among researchers. As such, we are grateful to Rami Mustafa and Lee McCluskey of the University of Huddersfield and Fadi Thabtah of the Canadian University of Dubai for their preparing and sharing the data.


  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007).
  2. 2.
    Pietraszeka, T., Tanner, A.: Data mining and machine learning—towards reducing false positives in intrusion detection. Inf. Secur. Techn. Rep. 1(3), 169–183 (2005)Google Scholar
  3. 3.
    Kumar, V., Srivastava, J., Lazarevic, A.: Managing Cyberthreats: Issues, Approaches, and Challenges, vol. 5. Springer Science & Business Media (2006)Google Scholar
  4. 4.
    Saha, A., Sanyal, S.: Application layer intrusion detection with combination of explicit-rule-based and machine learning algorithms and deployment in cyber- defence program. Int. J. Adv. Netw. Appl. 6(2), 2202–2208 (2014)Google Scholar
  5. 5.
    Topham, L., et al.: Cyber security teaching and learning laboratories: a survey. Inf. Secur. 35(1), 51–80 (2016)Google Scholar
  6. 6.
    Bailetti, T., Gad, M., Shah, A.: Intrusion learning: an overview of an emergent discipline. Technol. Innov. Manag. Rev. 6(2), 15–20 (2016)Google Scholar
  7. 7.
    Dawson, M.: Hyper-Connectivity: Intricacies of National and International Cyber Securities. 10800987th, London Metropolitan University (United Kingdom), Ann Arbor (2017)Google Scholar
  8. 8.
    Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 305–316. IEEE (2010)Google Scholar
  9. 9.
    Buczak, A., Guven, E.: A survey of data mining and machine learning methods for cybersecurity intrusion detection. IEEE Commun. Surv. Tutor. 18(2), 133–1176 (2016)Google Scholar
  10. 10.
    Hallaq, B., et al.: Artificial intelligence within the military domain and cyber warfare (2017)Google Scholar
  11. 11.
    Hurley, J.S.: Beyond the struggle: artificial intelligence in the department of defense (DoD) (2018)Google Scholar
  12. 12.
    Pechenkin, A., Demidov, R.: Application of deep neural networks for security analysis of digital infrastructure components (2018)Google Scholar
  13. 13.
    Ahmad, B., Wang, J., Zain, A.A.: Role of machine learning and data mining in internet security: standing state with future directions. J. Comput. Netw. Commun. 2018, 10 (2018)Google Scholar
  14. 14.
    Ahmad, B., Wang, J., Zain, A.A.: Role of machine learning and data mining in internet security: standing state with future directions. J. Comput. Netw. Commun. 2018, 10 (2018)Google Scholar
  15. 15.
    Li, C., Wang, J., Ye, X.: Using a recurrent neural network and restricted Boltzmann machines for malicious traffic detection. NeuroQuantology 16(5) (2018)Google Scholar
  16. 16.
    Teixeira, M.A., et al.: SCADA system testbed for cybersecurity research using machine learning approach. Future Internet 10(8), 76 (2018)Google Scholar
  17. 17.
    Ahmad, K., Yousef, M., et al.: Analyzing cyber-physical threats on robotic platforms. Sensors 18(5), 1643 (2018)Google Scholar
  18. 18.
    Ramotsoela, D., Abu-Mahfouz, A., Hancke, G.: A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 18(8), 2491 (2018)Google Scholar
  19. 19.
    Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)Google Scholar
  20. 20.
    Yamanishi, K., Takeuchi, J., Maruyama, Y.: Data mining for security. NEC J Adv Technol 2(1), 63–69 (2005)Google Scholar
  21. 21.
    Witten, I.H., Frank, E.: Data Mining—Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier (2005)Google Scholar
  22. 22.
    Tesink, S.: Improving intrusion detection systems through machine learning (2007).
  23. 23.
    Čeponis, D., Goranin, N.: Towards a robust method of dataset generation of malicious activity for anomaly-based HIDS training and presentation of AWSCTD dataset. Baltic J Mod Comput 6(3), 217–234 (2018)Google Scholar
  24. 24.
    Li, Y., Qiu, R., Jing, S.: Intrusion detection system using Online Sequence Extreme Learning Machine (OS-ELM) in advanced metering infrastructure of smart grid. PLoS ONE 13(2) (2018)Google Scholar
  25. 25.
    Parrend, P., et al.: Foundations and applications of artificial Intelligence for zero-day and multi-step attack detection. EURASIP J. Inf. Secur. 2018(1), 1–21 (2018)Google Scholar
  26. 26.
    Siddiqui, M.Z., Yadav, S., Mohd, S.H.: application of artificial intelligence in fighting against cybercrimes: a review. Int. J. Adv. Res. Comput. Sci. 9, 118–121 (2018)Google Scholar
  27. 27.
    Monks, K., Sitnikova, E., Moustafa, N.: Cyber intrusion detection in operations of bulk handling ports (2018)Google Scholar
  28. 28.
    Masombuka, M., Grobler, M., Watson, B.: Towards an artificial intelligence framework to actively defend cyberspace (2018)Google Scholar
  29. 29.
    Zhao, Y., Japkowicz, N.: Anomaly behaviour detection based on the meta-Morisita index for large scale spatio-temporal data set. J. Big Data 5(1), 1–28 (2018)Google Scholar
  30. 30.
    Eskin, E., Portnoy, L.: Intrusion detection with unlabeled data using clustering. Columbia University, New York (1999)Google Scholar
  31. 31.
    Duddu, V.: A survey of adversarial machine learning in cyber warfare. Def. Sci. J. 68(4), 356–366 (2018)Google Scholar
  32. 32.
    Tolubko, V., et al.: Method for determination of cyber threats based on machine learning for real-time information system. Int. J. Intell. Syst. Appl. 10(8), 11 (2018)Google Scholar
  33. 33.
    Thakong, M., et al.: One-pass-throw-away learning for cybersecurity in streaming non-stationary environments by dynamic stratum network. PLoS ONE 13(9) (2018)Google Scholar
  34. 34.
    Alawad, H., Kaewunruen, S.: Wireless sensor networks: toward smarter railway stations. Infrastructures 3(3) (2018)Google Scholar
  35. 35.
    Amsaad, F., et al.: Reliable delay based algorithm to boost PUF security against modeling attacks. Information 9(9) (2018)Google Scholar
  36. 36.
    Nascimento, Z., Sadok, D.: MODC: a pareto-optimal optimization approach for network traffic classification based on the divide and conquer strategy. Information 9(9) (2018)Google Scholar
  37. 37.
    Kanatov, M., Atymtayeva, L., Yagaliyeva, B.: Expert systems for information security management and audit. Implementation phase issues. In 2014 Joint 7th International Conference on an Advanced Intelligent Systems (ISIS), 3th International Symposium on Soft Computing and Intelligent Systems (SCIS), pp. 896–900. IEEE (2014)Google Scholar
  38. 38.
    Eskin, E., Arnold, A., Portnoy, L.: A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data, p. 4. Columbia University, New York (2001)Google Scholar
  39. 39.
    Snoek, J., Larochelle, H., Adams, R.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)Google Scholar
  40. 40.
    Almeida, M. Alzubi, M., Kovacs, S., Alkasassbeh, M.: Evaluation of machine learning algorithms for intrusion detection system. In: 2017 IEEE 3th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000277–000282. IEEE (2018)Google Scholar
  41. 41.
    Ford, V., Siraj, A.: Applications of machine learning in cyber security. In: Proceedings of the 27th International Conference on Computer Applications in Industry and Engineering (2014)Google Scholar
  42. 42.
    Singh, N.: Artificial Neural Networks and Neural Networks Applications [Online] (2017). Available at: Accessed 3 Nov 2018
  43. 43.
    Lee, W., Stolfo, S.: Data mining approaches for intrusion detection. In: USENIX Security Symposium, pp. 79–93 (1998)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mohamed Alloghani
    • 1
    • 2
    Email author
  • Dhiya Al-Jumeily
    • 1
  • Abir Hussain
    • 1
  • Jamila Mustafina
    • 3
  • Thar Baker
    • 1
  • Ahmed J. Aljaaf
    • 1
    • 4
  1. 1.Liverpool John Moores UniversityLiverpoolUK
  2. 2.Abu Dhabi Health Services Company (SEHA)Abu DhabiUAE
  3. 3.Kazan Federal UniversityKazanRussia
  4. 4.Centre of ComputerUniversity of AnbarRamadiIraq

Personalised recommendations